Advertisement
Advertisement

More Related Content

Slideshows for you(20)

Similar to From Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use Cases(20)

Advertisement
Advertisement

From Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use Cases

  1. From Advanced Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use Cases Dr. Alexander Jarasch
  2. • 5 partners + assoc. partners • 450 researchers • bundles basic research and clinical trials expertise • => variety of data 
 => unstructured 
 => heterogeneous 
 => not connected 
 => unFAIR
  3. DZD Data and Knowledge Management team Dr. Alexander Jarasch Justus Täger Tim Bleimehl Angela Dedie Yaroslav Zdravomyslov
  4. The Challenge Connecting data (silos) -> get new insights Easy question -> Difficult to answer
  5. The Challenge Variety of users / diversity of scientific questions Scientists Medical
 Doctors Data
 Scientists Graphdatabase
  6. Biological question: Are human T2D genes enzymes acting on metabolites which in turn are regulated in pig diabetes model? 
 The actual question (from a data-point-of-view): 
 
 Is there a connection between A and R? => 3s to look into the Excel sheet Why graph? Easy scientific question
  7. 
 The actual question (from a data-point-of-view): 
 
 Is there a connection between A and R? => 3s to look into the graph A B C E D F G K Q R S W Z U Why graph? Easy scientific question
  8. Back to the question Are human T2D genes enzymes acting on metabolites which in turn are regulated in pig diabetes model? Genomics Human diabetic data Genes SNPs Proteins Enzymes Pathways Metabolites Metabolomics Pre diabetic pig Metabolites List of SNPs List of Genes of (species 1) List of Proteins of (species 1) List of loci List of Enzymes of (species 1) List of Pathways of (species 1) List of Metabolites of (species 1) List of Metabolites of (species 2) graph
  9. Why graph? -> why not relational • biomedical data / healthcare data is highly connected • => variety of data 
 => unstructured 
 => heterogeneous 
 => not connected 
 => unFAIR • easy to model • extremely flexible / easy adoptable („re-shaping the graph“) vs. static SQL model • scalable (Billion of nodes+relationships on a single machine • easy to query (cyclic dependencies) • GraphDataScience library + graph embeddings
  10. Alzheimer‘s cancer cardio vascular diseases diabetes Lung diseases infectious diseases new hypotheses Diseases are connected
  11. DZDconnect: Concept DZD in-house data Natural Language Processing Inferring knowledge Knowledge Graph
  12. DZDconnect: stats • PROD-Server: 323m nodes, 1.1bn relationships => 480GB • DEV-Server: 1.1bn nodes, 4.8bn relationships • Singleserver (60 CPUs, 256GB memory, only SSDs) • 4 developers 
 • Neo4j enterprise (live backup, GDS) • UI: flask web server, SemSpect, Neo4j browser • Visualization for interactive browsing (SemSpect by derive GmbH) • Bloom (semi-natural-language queries) Strata Data 
 Award finalist 2019 bytes4diabetes Award 2020 Graphie Award 2018 We have 
 DB role model
  13. DZDconnect: data integration + ML Gene RNA Protein CODES CODES CODES* • Python • Py2Neo, GraphIO • Docker Pipeline for orchestration (open-source by DZD) • Based on integrated data => annotate / enrich • textmatching + Natural Language Processing • „shortcuts“ for queries (reduce #hops) • inferring knowledge
  14. DZDconnect: data model <-> human readable = easy to query
  15. DZDconnect: data model
  16. The Challenge User with a specific input => specific output Scientist multi-omics
 experiment
 output Flask app
  17. The Challenge User ”start somewhere -> explore freely knowledge” SemSpect interactive browsing Start from any node Scientist
 or
 Medical
 Doctor
  18. The Challenge User with data analysis skills / computer scientist Scientist Start from any node Cypher query language Graph Data Science
  19. Use case 1 Handle mapping identifiers of molecular entities Knowledge Graph
  20. Query „friends of a friend“ on a gene level 
 Example: diabetes relevant gene ‚TCF7L2’ match path=(g:Gene{sid:'TCF7L2'})-[:MAPS|SYNONYM*0..2]-(g1:Gene) return path
  21. Use case 2 Find information that is NOW connected Knowledge Graph
  22. Query for SNPs (mutations) associated to diabetes 
 Output: relevant protein and its function (ontology terms) match (tr:Trait) where tr.name contains ‚diabetes mellitus‘ with tr as disease match path=(disease)<-[:ASSOCIATED_WITH_TRAIT]-(asso:Association)<-[:SNP_HAS_ASSOCIATION]-(snp:SNP)- [:SNP_HAS_GENE]-(gene:Gene)-[:MAPS]-(g1:Gene)-[x:CODES]->(transcript:Transcript)-[:CODES]-> (prot:Protein)-[:ASSOCIATION]->(term:Term)—(o:Ontology) return path
  23. Use case 3 Using graph algorithms to infer new insights Natural Language Processing 
 Ontologies Knowledge Graph
  24. Google’s page rank algorithm - find the most relevant gene 
 finding ACE2 - the receptor the SARS-Cov2 virus uses to enter the cell • 140’000 abstracts from Covid19 related publications • NamedEntityRecognition 
 of gene names • Page Rank identified 
 ‚ACE2‘ as the most relevant 
 gene
  25. Who’s this ACE2-guy? source: https://www.benaroyaresearch.org/blog/post/11-things-know-about-mrna-vaccines-covid-19
  26. Use case 4 Using node embeddings to sub phenotype diabetic patients Natural
  27. DZDconnect connect raw data of diabetic patients with cancer Clinical data from 404 diabetic patients
  28. DZDconnect connect lipidomics fingerprint Lipidomics Lipidomics experiment with 116 specific lipids
  29. DZDconnect connect transcriptomics fingerprint Transcriptomics experiment with 58’345 specific Transcripts (RNAs)
  30. Transform patients Fast random projections (fastRP) CALL gds.fastRP.write ( 'patients' , { embeddingDimension: 50 , writeProperty: 'fastrp- embedding' } ) YIELD nodePropertiesWritten Lipido
  31. k-nearest neighbour clustering with k=5 representing the 5 diabetes subtypes patient 01 patient 02 patient 03 Graph
 algorithms patient 04 patient 05 patient 02 p a t i e n t 0 4 patient 03 patient 05 patient 01 subphenotyping of diabetic patients
  32. DZDconnect connect patient data with knowledge graph Transcript Gene Synonyms Abstract PubMed 
 Article Keyword 
 MeSH-term Ontology term Hello role-model :-)
  33. Take home message • Knowledge graph • as single point of truth • connect in-house data • scalability • infer new insights 
 • Use cases: • simple and advanced (Cypher) queries • Graph Data Science library (page rank, kNN) • Node embeddings for complex data • NLP • Visualization of graph • different users • flask app, browser, SemSpect,…
  34. Thanks to
Advertisement