The webinar explores some of the current opportunities for AI within Life Science and look ahead to what we can expect to see over the coming years. These are the accompanying slides.
3. ŠPistoiaAlliance
Poll Question 1: What role do you play in
your company
A. IT
B. data scientist/informatician
C. scientist
D. information professional
E. other
5. Poll Question 2: What is your familiarity
with AI/Deep learning?
A. I am using AI/Deep learning
B. I am experimenting with AI/Deep learning
C. I am aware of AI/Deep learning
D. I know next to nothing about it
9. ŠPistoiaAlliance
Artificial Intelligence
(AI)
Field of computer science that allows
computers to âseem humanâ in some way
by replicating human cognitive functions
(e.g., learning and problem solving)
Machine Learning
(ML)
Subset of AI approaches that gives
computers the ability to learn from
and make predictions on data without
being explicitly programmed (i.e. learn
on their own from new data)
Deep Learning
(DL)
Simulates many (deep) hierarchical
layers of neurons in the human brain: by
running large amounts of data through
this simulation, it develops its own
understanding of the concepts inherent in
the data
10. ŠPistoiaAlliance
⢠Storage and processing power as a cheap, on-demand utility:
⢠Graphics Processing Units (GPUs)
⢠Cloud computing allows affordable GPUs at scale
⢠Critical mass in open source software community
⢠Powerful new applications for known AI techniques (e.g., deep learning)
⢠Global, online AI community sharing advances daily
⢠Open source software from the community and tech giants (e.g., Google TensorFlow)
⢠Huge AI investments from tech titans who see AI as a strategic asset
⢠Exponential growth in data to analyze using DL. In life science:
⢠Electronic health records
⢠Genomic data
⢠Patient monitoring and treatment devices (e.g., EKG, Pulse, Oxygen, IV Pumps, etc..)
⢠Consumer biomonitoring devices (e.g., FitBit, Apple Watch, smartphones)
⢠Environmental data
⢠Data registries
⢠Medical literature and supporting primary data
Deep Learning (DL): Why Now?
12. ŠPistoiaAlliance
Artificial
Intelligence
Machine
Learning
Knowledge Representation and
Reasoning
Automated
Planning
Natural Language
Processing
Multi-Agent
Systems
Robotic
s
Reinforcement Learning Supervised Learning Semi-supervised Learning Unsupervised Learning
Markov Decision
Processes (e.g. Policy
iteration)
Classification/Regression Clustering Summarization Anomaly Detection
Distance-based (e.g.: LOF)
Model-based
(e.g.: MMPP)
Graphical and Statistical
(e.g.: Exponential
Smoothing)
Dimensionality Reduction
(e.g. PCA, SVD)
Association and Sequence
models (e.g.: apriori
algorithm)
Density-based
(e.g.: DBSCAN)
Hierarchical
(e.g.: Single-linkage)
Centriod-based
(e.g.: K-Means)
Distribution-based
(e.g.: Mixture of
Gaussians)
Instance-based
(e.g.: KNN, CBR)
Decision Tree
(e.g.: Random
Forest)
Artificial Neural
Networks (e.g.
Perceptron)
Bayesian Networks
(e.g.: NaĂŻve Bayes)
Kernel-based
(e.g. SVM)
13. ŠPistoiaAlliance
Creating artificial intelligence solutions using supervised learning with a neural
network:
Dogs
2
Collecting and
annotating
data sets
3
Training via
Computation
4
Independent
Validation of
the Algorithm
5
Deployment and
Monitoring
1 Define a Narrative AI Use Case
Cats
16. ŠPistoiaAlliance
16
I/O library
optimized for
scale + speed
Self-
documenting
container
optimized for
scientific data +
metadata
Users who
need both
features
HDF5 + Deep Learning
1
6
HDF5 already integrated into every major DL Framework
(TensorFlow, Caffe, Keras, etc.)
17. ŠPistoiaAlliance
v
v
v
What does the HDF Group do?
⢠HDF5 Community Edition + Enterprise Edition
⢠Connectors: ODBC + Cloud (Beta)
⢠Add-Ons: compression + encryption
⢠HDF Support Packages (Basic + Pro + Premier)
⢠Support for h5py + PyTables + pandas (NEW)
⢠Training
⢠HDF: new functionality + performance tuning for specific use cases
⢠HPC software engineering with scientific expertise
⢠Deep Learning expertise
Products
Support
Consulting
1
7
19. ŠPistoiaAlliance
Poll Question 3: What is your companyâs
primary use for AI/Deep learning
A. Early Discovery/ Pre-clinical
B. Development & Clinical
C. Imaging Analysis
D. Other
E. Donât use AI
20. Sean Ekins, CEO, Collaborations
Pharmaceuticals, Inc.
Deep Learning in Pharmaceutical Research
21. ŠPistoiaAlliance
AI in Pharma is not new!
222 October, 2017
⢠Neural Networks
⢠Genetic algorithms
⢠SVM
⢠âUsedâ for decades
⢠Why it never took off:
â Compute power
â Lack of training data
â Limited support
â Most Scientists did not believe themâŚneeded a
paradigm shift
â Pharma mergers culled 10,000âs scientists
DEEP LEARNING
23. ŠPistoiaAlliance
HTS
phenotypic
screen
Molecule
Screening database
Machine learning models
Vendor library
Top scoring molecules assayed
in vitro
Bernoulli Naive Bayes, Logistic linear regression,
AdaBoost Decision Trees, Random Forest, Support
Vector Machines (SVM), Deep Neural networks
(DNN)
Speeding drug discovery with AI
âś Molecular pattern recognition
of biological data
âś Descriptors identify these
patterns
âś Define active and inactive
features
âś Used to generate predictions
for drug activity at a certain
target (organism, protein of
interest)
26. ŠPistoiaAlliance
Deep Learning in Pharmaceutical Research
272 October, 2017
⢠Bioinformatics
â Protein disorder
â Refine docking
complexes
â Model CLIP-seq data
â High content image
analysis data
â Biomarkers
â Protein contacts
â Cancer diagnosis
⢠Pharmaceutical
â Solubility
â Gene expression data
â Formulation
â QSAR â Merck DL out
performed random
forests in 11 /15 and
13/15 datasets
â Tox21
Where else could we apply DL in drug discovery?
Pharmacoeconomics?
27. ŠPistoiaAlliance
Gaps in Deep Learning for Pharmaceutical research
282 October, 2017
⢠TensorFlow
⢠Deeplearning4j
⢠Facebook (Torch)
⢠Microsoft (CNTK)
⢠Which metrics to use?
⢠Which descriptors?
⢠Are the DL over training?
⢠Lack of prospective testing.
29. ŠPistoiaAlliance
Comparison of TB Machine-Learning Models (1ÂľM)
302 October, 2017
Logistic Regression (LR)
Adaboosted Decision Trees (ADA)
Random Forest (RF)
Naive-bayes (BNB)
Support Vector Machines (SVM)
Deep Neural Networks (DNN)
âś TB data from literature
âś ~19,000 molecules
âś ECFP6 descriptors
âś Used previously with
Bayesian methods
âś Multiple metrics
âś 5 fold cross val
âś Classic ML -Open source
Scikit-learn http://scikit-
learn.org/stable/
âś Deep Neural Networks
(DNN) using Keras
https://keras.io/, and
Tensorflow
www.tensorflow.org,
30. ŠPistoiaAlliance
Small scale Machine Learning comparison
312 October, 2017
⢠Comparing different
algorithms and using FCFP6
fingerprints
⢠Deep learning seems to
improve model ROC statistics
in 4/6 cases.
⢠Data sets range from 100s â
>300K
⢠All classification models
⢠Next steps evaluate all the
datasets in ChEMBL,
PubChem, ToxCast etc
31
Korotcov et al., Submitted
31. ŠPistoiaAlliance
Building Machine Learning models Assay Central
322 October, 2017
⢠Curate data and build
models
⢠Provide models and
collections as jar files
Add DL algorithm to Assay Central
32. ŠPistoiaAlliance
Acknowledgments
332 October, 2017
⢠Kim Zorn Assay Central Guru
⢠Alex Clark Assay Central
⢠Thomas Lane PhD intern UNC
⢠Dan Russo PhD intern Rutgers
⢠Jacob Gerlach High School Intern
⢠Valery Tkachenko Deep Learning Consultant
⢠Alex Korotcov Deep Learning Consultant
⢠Thanks also to: Renee Arnold, Peter Swaan
Funding from NIGMS NIH R43GM122196
33. ŠPistoiaAlliance
Poll Question 4: What is the greatest
barrier to application of AI at your org
A. Technical & skills expertise
B. Access to data
C. Data quality
D. Management support/understanding
E. Other
34. Peter Henstock - Business
Technology, Pfizer Inc.
Why is pharma lagging in the AI arena whereas
other industries are already transformed
36. ŠPistoiaAlliance
What does Waze do?
⢠Obtain public data: maps & locations
⢠Acquire & organize data for AI analyses
â Leverage historical traffic data
â Integrate new traffic information
⢠Utilize AI algorithms
â Fastest route predictions
⢠Present timely information through UI
37. ŠPistoiaAlliance
Why Isnât AI Working Yet for Pharma?
drugwaze
Rescreening
55% chance of new series
6 weeks $1.2MM
Optimization
14% issue series 1
Solubility cause
23% issue series 2
Safety cause
5% issue series 3
8.2 months to Phase 1
Predicted FDA approval
chance: 37%
Recommended actions:
1) Resolve the
38. ŠPistoiaAlliance
Keys to Success
⢠Obtain public data
⢠Acquire & organize data for AI analyses
⢠Utilize AI algorithms
⢠Present timely information through UI
39. ŠPistoiaAlliance
Need for a Chief Data Officer
Value
Proposition
https://www.123rf.com/photo_17347316_businessman-pulling-rope-on-white-background.html
$ $ $
Acquire and organize data for AI
40. ŠPistoiaAlliance
Analytics First, Then AI
⢠Readiness for Analytics & AI
âCurated data sources
âAutomated data management processes
âStructured data analytics
⢠âIf your company isnât good at analytics,
itâs not ready for AIâ
â Harvard Business Review June 7, 2017
41. ŠPistoiaAlliance
Keys to Success
⢠Obtain public data
⢠Acquire & organize data for AI analyses
⢠Utilize AI algorithms
⢠Present timely information through UI
44. ŠPistoiaAlliance
AI & Pharma Skillset Intersection
https://www.quora.com/What-is-the-difference-between-Data-Analytics-Data-Analysis-Data-Mining-Data-Science-
Machine-Learning-and-Big-Data-1
Software
Engineering
Bioinformatics
Architecture
& Systems
Clinical
Statistics
HPC/Linux Farm
AI &
Machine
Learning
Scientists
45. ŠPistoiaAlliance
Does Pharma Have the Right Skills?
ManagementBusiness
Computer
Science
Biology
Chemistry
Medicine
Law
Statistics
Physics
BS MS/MBA PhD/MD/JD
46. ŠPistoiaAlliance
Does Pharma Have the Right Skills?
ManagementBusiness
Computer
Science
Biology
Chemistry
Medicine
Law
Statistics
Physics
BS MS/MBA PhD/MD/JD
Need depth &
breadth across
AI areas
48. ŠPistoiaAlliance
Threat of High Salaries for âExpertiseâ
Paul Minton:
Waiter ($20K) ď data scientist ($100K)
âAs Tech Booms, Workers Turn to Coding for Career Changeâ. July 28, 2015 New York Times
50. ŠPistoiaAlliance
AI is a harder concept to grasp
⢠Pharma & IT grasp replacement technologies
â Virtual machine replaces physical machine
â Cloud storage replaces local disks
â Agile replaces waterfall method
â High Throughput Screening replaces âscreeningâ
â High Content Screening replaces imaging
⢠AI and Machine Learning
â Provide a data-driven complement to many disciplines
â Apply from early discovery to marketing
â Span journals, data, omics, images, decision-making
51. ŠPistoiaAlliance
Volume of Tasks
⢠Easy to develop AI solutions around a single task
â Waze navigates
â Amazon sells
â LinkedIn links
â Facebook advertises
⢠Pharma/Biotech tasks are varied
â Text mining for targets
â Screening and imaging technologies
â Using âOmics
â Drug optimization
â Clinical trials
â Patient reports and communication
â Predictions on activity, safety, trial enrollment, outcomesâŚ
55. ŠPistoiaAlliance
AI Is Having a Stifled Impact in Pharma
⢠Bottom-Up Proof Cycle
â Scientific domain culture
â Continually need to prove AIâs value to every group
â Leveraging 1 data set at a time for 1 AI problem
â Gains are localized to small groups
⢠Minimal investment
â Sitting on more data than most industries
â Failing to analyze and leverage this data
â Hiring less AI expertise than small tech startups
â Relying on expensive external collaborations
56. ŠPistoiaAlliance
How to Succeed
1) Organize the data for AI
âData, rather than software, is the barrierâ
2) Invest in AI talent
âSimply downloading and âapplyingâ open-source software
to your data wonât work. AI needs to be customized to your
business context and data. This is why there is currently a
war for the scarce AI talent that can do this work.â
3) Develop an AI strategy
âAfter understanding what AI can and canât do, the next
step for executives is incorporating it into their strategies.
[This] is the beginning, not the endâŚ.â
What Artificial Intelligence Can and Canât do Nowâ
Harvard Business Review Nov 9, 2016 Andrew Ng
58. ŠPistoiaAlliance
Beyond BMI: Body Composition
Phenotyping in the UK Biobank
The next Pistoia Alliance Discussion Webinar:
Date: October 25, 2017
check http://www.pistoiaalliance.org/events/ for the latest information
So at a high level, there are five basic steps to building a supervised neural network to differentiate a dog from a cat in a picture.
We define a narrative AI use case
Then we collect and annotate data sets related to that use case
Then we use computation to training an algorithm to accomplish the use case
Then we conduct an Independent Validation of the Algorithm
Finally, we can deploy the use case and monitor it for any issues that may come up
But thereâs a big problemâŚ.
What you see here is only a small portion of the photo that was submitted to this classification service
This is the full photo!
So as you can see, the story of this picture isnât that Roo is a hound. Itâs that Roo is a troublemaker who just shredded the Tilkin familyâs couch!
And this service missed both the couch and the culprit.
That inability to appreciate the larger context is something that AI is still weak at doing.