Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Drug Repurposing using Deep Learning
on Knowledge Graphs
Or how to leverage AI to recycle (old) new
drugs
About Us
Alex Thomas is a principal data scientist at Wisecube. He's
used natural language processing and machine learning...
Drug Discovery is Broken
- Every year, around US$200 billion is
spent globally on biomedical
research
- 75% of potential d...
Drug Repurposing: looking for (old) new cures
Given the high attrition rates, substantial costs and
slow pace of new drug ...
AI (NLP + Knowledge Graphs + Deep Graph Learning) to the rescue
Wisecube works with Research
and Pharmaceutical
organizati...
Wisecube Drug Repurposing Pipeline Overview
Pipeline Deep Dive
● Datasets
○ Ingesting Data
○ Graph Building
○ Link Prediction
Datasets
❏ Drug Repurposing Knowledge
Graph (DRKG)
❏ “Drug Repurposing Knowledge Graph (DRKG) is a comprehensive
biologica...
Datasets: DRKG
❏ DrugBank
❏ “DrugBank is a pharmaceutical knowledge base that is enabling major advances across the data-d...
Pipeline Deep Dive
✓ Datasets
● Ingesting Data
○ Graph Building
○ Link Prediction
Ingesting Data
❏ Unifying the data
❏ Loading the data
❏ Post-processing the data
Ingesting Data: Unification
❏ DrugBankID -> NCBI CID -> ChEMBLID
❏ PUG REST API
❏ https://pubchemdocs.ncbi.nlm.nih.gov/pug...
Ingesting Data: Loading
❏ Ingest into Graph DB
❏ Neptune
❏ CosmosDB
❏ Any Graph DB which supports Gremlin
❏ Graph DB vs Tr...
Ingesting Data: Post-processing
1. Save predictions
2. Experts review
3. Ingest new edges
Pipeline Deep Dive
✓ Datasets
✓ Ingesting Data
● Graph Building
○ Link Prediction
Graph Building
❏ Explicit Relationships
❏ Literature-based Relationships
❏ Link Prediction Relationships
Graph Building: Explicit Relationships
❏ Explicit Relationships
❏ Triples data
❏ Inherently represents relationships
❏ Tab...
Graph Building: from Literature
❏ Heuristic vs Model
❏ Relationship extraction data sets are rare, compared to NER models
...
Graph Building: from Literature
1. Given two terms, u and v
2. Calculate TF.IDF for extracted entities
3. Sum TF.IDF for u...
Graph Building: from Literature
1. Given two terms, u and v
2. Calculate TF.IDF for extracted entities
3. Sum TF.IDF for u...
Pipeline Deep Dive
✓ Datasets
✓ Ingesting Data
✓ Graph Building
● Link Prediction
Link Prediction
❏ Untyped models
❏ Jaccard
❏ Deepwalk
❏ Typed Models
❏ TransE-L2
❏ DLG
❏ “Deep Graph Library (DGL) is a Py...
❏ Intuition
❏ Unconnected nodes which are connected to many of the same nodes may be connected
❏ Pro’s
❏ No training neces...
❏ Intuition
❏ A node can be characterized by the paths it occurs in
❏ Creates embeddings (vector representations)
❏ Pro’s
...
❏ Intuition
❏ Learn embeddings that directly predict embeddings
❏ Pro’s
❏ Directly predicts embeddings
❏ After embeddings ...
Research Case Study: Early Results
We worked with St.John’s
Institute (Part of Providence
Healthcare) to repurpose
drugs t...
In Summary
• Drug Discovery Scientists are drowning
in disjoined datasets and bringing new
drugs to market is expensive an...
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Drug Repurposing using Deep Learning on Knowledge Graphs

Download to read offline

Discovering new drugs is a lengthy and expensive process. This means that finding new uses for existing drugs can help create new treatments in less time and with less time. The difficulty is in finding these potential new uses.

How do we find these undiscovered uses for existing drugs?

We can unify the available structured and unstructured data sets into a knowledge graph. This is done by fusing the structured data sets, and performing named entity extraction on the unstructured data sets. Once this is done, we can use deep learning techniques to predict latent relationships.

In this talk we will cover:

Building the knowledge graph
Predicting latent relationships
Using the latent relationships to repurpose existing drugs

  • Be the first to like this

Drug Repurposing using Deep Learning on Knowledge Graphs

  1. 1. Drug Repurposing using Deep Learning on Knowledge Graphs Or how to leverage AI to recycle (old) new drugs
  2. 2. About Us Alex Thomas is a principal data scientist at Wisecube. He's used natural language processing and machine learning with clinical data, identity data, employer and jobseeker data, and now biochemical data. Alex is also the author of Natural Language Processing with Spark NLP. Vishnu is the CTO and Founder of Wisecube AI and has over two decades of experience building data science teams and platforms. Vishnu has extensive experience with various graph databases including Neo4J, TitanDB (now JanusGraph) and more recently OrientDB and AWS Neptune.
  3. 3. Drug Discovery is Broken - Every year, around US$200 billion is spent globally on biomedical research - 75% of potential drug target research could not be reproduced - New drugs approved / Billion$ spent on R&D has halved every 9 years since 1950 - This is trend is now called Eroom’s Law (opposite of Moore’s law)
  4. 4. Drug Repurposing: looking for (old) new cures Given the high attrition rates, substantial costs and slow pace of new drug discovery and development, repurposing of 'old' drugs is a viable alternative. Repurposing drugs to treat both common and rare diseases is increasingly becoming an attractive proposition because it involves the use of de-risked compounds Various data-driven and experimental approaches have been suggested for the identification of repurposable drug candidates.
  5. 5. AI (NLP + Knowledge Graphs + Deep Graph Learning) to the rescue Wisecube works with Research and Pharmaceutical organizations to help leverage the power of AI to accelerate drug discovery and repurposing We are currently working with St.John’s Institute to repurpose drug candidates
  6. 6. Wisecube Drug Repurposing Pipeline Overview
  7. 7. Pipeline Deep Dive ● Datasets ○ Ingesting Data ○ Graph Building ○ Link Prediction
  8. 8. Datasets ❏ Drug Repurposing Knowledge Graph (DRKG) ❏ “Drug Repurposing Knowledge Graph (DRKG) is a comprehensive biological knowledge graph relating genes, compounds, diseases, biological processes, side effects and symptoms.” ❏ https://github.com/gnn4dr/DRKG ❏ ChEMBL ❏ “ChEMBL is a manually curated database of bioactive molecules with drug-like properties.” ❏ https://www.ebi.ac.uk/chembl/ ❏ PubChem ❏ “PubChem is an open chemistry database at the National Institutes of Health (NIH).” ❏ https://pubchemdocs.ncbi.nlm.nih.gov/about
  9. 9. Datasets: DRKG ❏ DrugBank ❏ “DrugBank is a pharmaceutical knowledge base that is enabling major advances across the data-driven medicine industry.” ❏ Link: https://go.drugbank.com/ ❏ GNBR ❏ “A global network of biomedical relationships derived from text” ❏ https://zenodo.org/record/1134693#.WqQe1GbVSL9 ❏ Hetionet ❏ “Hetionet is an integrative network of biomedical knowledge assembled from 29 different databases of genes, compounds, diseases, and more.” ❏ https://het.io/ ❏ StringDB ❏ “STRING is a database of known and predicted protein-protein interactions.” ❏ https://string-db.org/cgi/about ❏ IntAct ❏ “IntAct provides a freely available, open source database system and analysis tools for molecular interaction data. “ ❏ https://www.ebi.ac.uk/intact/ ❏ DGIdb ❏ “[I]nformation on drug-gene interactions and the druggable genome, mined from over thirty trusted sources.” ❏ https://www.dgidb.org/ HETIONET
  10. 10. Pipeline Deep Dive ✓ Datasets ● Ingesting Data ○ Graph Building ○ Link Prediction
  11. 11. Ingesting Data ❏ Unifying the data ❏ Loading the data ❏ Post-processing the data
  12. 12. Ingesting Data: Unification ❏ DrugBankID -> NCBI CID -> ChEMBLID ❏ PUG REST API ❏ https://pubchemdocs.ncbi.nlm.nih.gov/pug-rest ❏ PUG VIEW REST API ❏ https://pubchemdocs.ncbi.nlm.nih.gov/pug-view NCBI CID <- DrugBankID NCBI CID -> ChEMBLID
  13. 13. Ingesting Data: Loading ❏ Ingest into Graph DB ❏ Neptune ❏ CosmosDB ❏ Any Graph DB which supports Gremlin ❏ Graph DB vs Triple Store ❏ Most open data is in RDF triples formats (RDF/XML, Turtle, N-Triples) ❏ Modern Graph Dbs are faster than Triple Stores @prefix sio: <http://semanticscience.org/resource/> . @prefix compound: <http://rdf.ncbi.nlm.nih.gov/pubchem/compound/> . @prefix descriptor: <http://rdf.ncbi.nlm.nih.gov/pubchem/descriptor/> . compound:CID400516 sio:has-attribute descriptor:CID400516_Isomeric_SMILES , descriptor:CID400516_Isotope_Atom_Count , descriptor:CID400516_Molecular_Formula , descriptor:CID400516_Molecular_Weight , descriptor:CID400516_Mono_Isotopic_Weight , descriptor:CID400516_Non-hydrogen_Atom_Count , ~id ~label articles:String[] source_ids:String[] name:String SMILES:String 8647 COMPOUND 13961;... CHEMBL1200689 Nitric oxide [N]=O 344 COMPOUND 268975;... CHEMBL142438 Nitrogen N#N 18030 COMPOUND 10081;... CHEMBL925 TYROSINE N[C@@H](Cc1ccc(O)cc1)C(=O)O 1534 COMPOUND 211538;... CHEMBL1616046 HYPOCHLOROUS ACID OCl 18800 COMPOUND 13464;... CHEMBL978 Methacholine CC(=O)OC(C)C[N+](C)(C)C 26747 COMPOUND 226005;.... CHEMBL863 Cysteine N[C@@H](CS)C(=O)O
  14. 14. Ingesting Data: Post-processing 1. Save predictions 2. Experts review 3. Ingest new edges
  15. 15. Pipeline Deep Dive ✓ Datasets ✓ Ingesting Data ● Graph Building ○ Link Prediction
  16. 16. Graph Building ❏ Explicit Relationships ❏ Literature-based Relationships ❏ Link Prediction Relationships
  17. 17. Graph Building: Explicit Relationships ❏ Explicit Relationships ❏ Triples data ❏ Inherently represents relationships ❏ Tabular data (flattened graph) ❏ 2 (or more) entities or IDs in each row ❏ Need to determine which fields are associated with which entity or edge ❏ RDBMS data ❏ Foreign keys ❏ Join tables
  18. 18. Graph Building: from Literature ❏ Heuristic vs Model ❏ Relationship extraction data sets are rare, compared to NER models ❏ Creating labels requires experts ❏ Heuristics with labels ❏ Stated relationships may span across multiple sentences ❏ Certain styles of language are excessively verbose ❏ Especially academic language
  19. 19. Graph Building: from Literature 1. Given two terms, u and v 2. Calculate TF.IDF for extracted entities 3. Sum TF.IDF for u and v over all documents • TF.IDF(u), TF.IDF(v) 4. Identify documents where u and v share a context • Sentence, window, paragraph, whole document 5. Sum TF.IDF for u and v over all documents where u and v share a context • TF.IDF(u,v) 6. The weight for the potential u~v edges is the ratio of these two sums 7. Accept edges over chosen threshold • Top 10%
  20. 20. Graph Building: from Literature 1. Given two terms, u and v 2. Calculate TF.IDF for extracted entities 3. Sum TF.IDF for u and v over all documents • TF.IDF(u), TF.IDF(v) 4. Identify documents where u and v share a context • Sentence, window, paragraph, whole document 5. Sum TF.IDF for u and v over all documents where u and v share a context • TF.IDF(u,v) 6. The weight for the potential u~v edges is the ratio of these two sums 7. Accept edges over chosen threshold • Top 10%
  21. 21. Pipeline Deep Dive ✓ Datasets ✓ Ingesting Data ✓ Graph Building ● Link Prediction
  22. 22. Link Prediction ❏ Untyped models ❏ Jaccard ❏ Deepwalk ❏ Typed Models ❏ TransE-L2 ❏ DLG ❏ “Deep Graph Library (DGL) is a Python package built for easy implementation of graph neural network model family, on top of existing DL frameworks (currently supporting PyTorch, MXNet and TensorFlow).” ❏ https://docs.dgl.ai/
  23. 23. ❏ Intuition ❏ Unconnected nodes which are connected to many of the same nodes may be connected ❏ Pro’s ❏ No training necessary ❏ Con’s ❏ Intuition is unrealistic ❏ Jaccard similarity ❏ For node u and v ❏ N(u): set of nodes connected to u ❏ N(v): set of nodes connected to v ❏ Jaccard similarity is |N(u) intersect N(v)| / |N(u) union N(v)| Link Prediction: Jaccard
  24. 24. ❏ Intuition ❏ A node can be characterized by the paths it occurs in ❏ Creates embeddings (vector representations) ❏ Pro’s ❏ Easy to train as it relies on models used in NLP ❏ Con’s ❏ Does not take into account the edge type ❏ DeepWalk ❏ For each node u, generate K random paths of length L with u in the middle of the path ❏ Using these paths, build a model to predict u given the nodes before and after it ❏ Model ❏ Build a model to predict if two nodes (represented by their embeddings) are connected DeepWalk
  25. 25. ❏ Intuition ❏ Learn embeddings that directly predict embeddings ❏ Pro’s ❏ Directly predicts embeddings ❏ After embeddings are built, no additional model is needed ❏ Learns representation for relationships ❏ Con’s ❏ More sophisticated model (more parameters) takes longer to train ❏ TransE L2 ❏ u, v are node representations (vectors) ❏ r is an edge type representation ❏ Train model that assumes ||u+r-v||2=0 if u and v are connected by and edge of type r TransE L2
  26. 26. Research Case Study: Early Results We worked with St.John’s Institute (Part of Providence Healthcare) to repurpose drugs to inhibit a kinase target related to Alzheimer's disease and have submitted the first round of drug candidates for expert review
  27. 27. In Summary • Drug Discovery Scientists are drowning in disjoined datasets and bringing new drugs to market is expensive and slow • Drug Repurposing is one way to bring new cures using old drugs • NLP, Knowledge Graphs and Deep Graph Learning are Key to leveraging the combined knowledge of experimental and literature based evidence for accelerating drug repurposing and research
  28. 28. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.

Discovering new drugs is a lengthy and expensive process. This means that finding new uses for existing drugs can help create new treatments in less time and with less time. The difficulty is in finding these potential new uses. How do we find these undiscovered uses for existing drugs? We can unify the available structured and unstructured data sets into a knowledge graph. This is done by fusing the structured data sets, and performing named entity extraction on the unstructured data sets. Once this is done, we can use deep learning techniques to predict latent relationships. In this talk we will cover: Building the knowledge graph Predicting latent relationships Using the latent relationships to repurpose existing drugs

Views

Total views

179

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

8

Shares

0

Comments

0

Likes

0

×