The Next Era:
Deep Learning for Biomedical Research
II-SDV Conference
Nice, France
23 - 25 April 2017
Srinivasan Parthiban
Parthys Reverse Informatics
Chennai, Tamil Nadu, India
Silicon Valley Waves of Innovation
Artificial Intelligence
Automation of intelligence
Artificial
Intelligence
Machine
Learning
Deep
Learning
Cognitive
Science
1950s
1980s
2010s
Early AI
stirs excitement
ML begins
to flourish
DL breakthroughs
drive AI boom
Big Data Availability
The World’s Technological Capacity to Store,
Communicate, and Compute Information
Hilbert, M., & Lopez, P (2011), Science, 332 (6025), 60-65
Computational Power
TPU
Traditional Programming
Computer
Data
Program
Output
Traditional Programming Vs Machine Learning
Computer
Abundant Data
Output
Program
Machine Learning
Machine Learning
Unsupervised learning Supervised learning
Reinforcement learning Optimization
"I know how to classify this data,
I just need you(the classifier) to sort it."
Supervised Learning
Optimization
Supervised
learning
Monday stock
prices
Tuesday stock
prices
Optimization
Supervised
learning
what we
know
what we
want to know
Transforms One Dataset into Another
Unsupervised Learning
Optimization
Unsupervised
learning
List of
datapoints
List of cluster
labels
Groups your data
Shallow Deep
Unsupervised
Neural Networks
Probabilistic Models
Supervised Supervised
Boosting
Perceptron
SVM RBM
AE
Sparse
Coding
Decision Tree
GMM
Neural Net
RNN
Conv. Net
D-AE
DBN DBM
BayesNP
S P
RBM: Restricted Boltzmann Machine, DBM: Deep Boltzman Machine, DBN : Deep Belief Network, GMM: Gaussian Mixture Model
AE: Auto Encoder, D-AE: Denoising Auto Encoder SVM: Support vector machine, SP: sigma-pi (sum n product), RNN: recurrent neural
Network BayesNP: Non-parametric Bayesian
Neurons and the Brain
A Mostly Complete Chart of
Neural Networks
(1 of 2)
(2 of 2)
AI is still Dumb
You See AI Sees
Lot of supervision (labeled data)
Convolutional Neural Network
Image Classification/Captioning
Cat
Dog
Horse
Elephant
Tiger
Training
?
Inputs Outputs
Convolutional Neural Network
It’s an elephant!
Recurrent Neural Networks (RNN)
𝒙 𝒕: the input at time step 𝑡
𝒔 𝒕: the hidden state at time 𝑡
𝒐 𝒕: the output state at time 𝑡
Prediction of next word:
the clouds are in the sky
I grew up in France …………
…..
I speak fluent French The issue :
Vanishing Gradient over time
LSTM and GRU
Long Short-Term Memory
i - input gate
f – forget gate
o – output gate
c – memory cell and
c˜ - new memory cell content
Gated Recurrent Unit
z – update gate
r – reset gate
h - hidden state
h˜ - new hidden state
LSTM
GRU
Design Patterns for RNN
Image
captioning
Sentiment
analysis
Machine
translation
Classify image
frame by frame
A man sitting in rooftop
restaurant with his laptop
Welcome to France
Bienvenue en France
II-SDV Conference
is absolutely a
great event
Image
Classification
Cat
A group of people
shopping at an
outdoor market.
There are many
vegetables at the
fruit stand.
Machine image recognition and descriptive captions generated
Language
Generating
RNN
Vision
Deep CNN
Deep Learning is Hot
Cars are now driving themselves …
The Technology is Working
March 2016:
World Go
Champion
beaten by
machine
Medical Imaging & Diagnostics
AI versus MD
What happens when
diagnosis is automated?
Radiology
Pathology
Dermatology
The Algorithm
Will See You Now
Automatic Detection of
Metastatic Breast Cancer
Dermatologist-level classification of skin
cancer with deep neural networks
Procedure for calculating inference class
probabilities from training class probabilities
Source: Nature 542, 115–118 (02 February 2017)
First FDA Approval For Clinical Cloud-Based Deep Learning In Healthcare
Arterys
Prostate MRI: An image is worth the 1000 blood tests.
MaxwellMRI
Deep Learning spots disease early using Chest-X rays
Enlitic
CT scans: Algorithms inform cardiovascular and metabolic state
of patients, and predicts the risk of heart attack and stroke
Zebra Medical Vision
AI heatmap: Deals Distribution by Category
Q1‘12-Q’17 (as of 3/23/17)
Source: CBinsights
Healthcare emerges as hottest area of investment
Binding Affinity from Features of Small
Molecules and Biological Targets
Molecules to Features to Properties
Data Repositories
Database Unique
Compounds
Experimental
facts
Main data types
ChEMBL v.21 1,592,191 13,968,617 PubChem HTS assays and data mined from
literature
BindingDB 529,618 1,207,821 Experimental protein-small molecule
interaction data
PubChem >60M >157M Bioactivity data from HTS assays
Reaxys >74M >500M Literature mined property, activity and
reaction data
SciFinder
(CAS)
>111M >80M Experimental properties 13C and 1H NMR
spectra, reaction data
GOSTAR >3M >24M Target-linked data from parents and articles
AZ IBIS - >150M AZ in-house SAR data points
OCHEM >600k >1.2M Mainly ADMET data collected from
literature
Architecture of Adversarial Autoencoder
(AAE) for New Molecule Development
Oncotarget, 2017, Vol. 8, (No. 7), pp: 10883-10890
Toxicity Prediction
Our Preliminary Model for ADMET
prediction
We have compiled a robust library of
over 155k records across 36 different ADMET properties to
facilitate 10-fold cross validation and confirm scalability
SMILES were used to represent the molecules in the database
Converted each molecule into Descriptors
856 2D/3D descriptors and 1024 unique “fingerprints”
We have implemented a process to identify the most
important descriptors upfront and focus resources on those
key data points
Our analysis yielded 27 descriptors that helps predict %GS
This subset is then fed into each of the 10 algorithms for
model fitting
Our initial results are promising.
Pred
Obs
Why Deep Learning?
How do data science techniques scale with amount of data?
Older learning
algorithms
Deep learning
Deep Learning Frameworks
% of papers mentioning the framework in March 2017
the fraction of papers that
mention the framework
somewhere in the full text
(anywhere — including
bibliography etc). For papers
uploaded on March 2017, we
get the numbers in this table.
% of papers framework
has been around
for (months)
9.1 tensorflow 16
7.1 caffe 37
4.6 theano 54
3.3 torch 37
2.5 keras 19
1.7 matconvnet 26
1.2 lasagne 23
0.5 chainer 16
0.3 mxnet 17
0.3 cntk 13
0.2 pytorch 1
0.1 deeplearning4j 14
The Rockstars of Deep Learning
Yoshua BengioYann Lecun Geoff Hinton Andrew Ng
IDSIA
Switzerland
Jürgen
Schmidhuber
Compute Data Algorithm
Algorithms
Data
Compute
% of Budget
100%
The Last Decade Now
AI reads Science
Your Science Assistant
Thank you
parthi@reverseinformatics.com

II-SDV 2017: The Next Era: Deep Learning for Biomedical Research