Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
GraphConnect
the power of graphs to analyze biological data
about me
who am i ...
➡ big data architect @ datablend - continuum
provide big data and nosql consultancy
• 5 years of han...
big data in pharma
massive data

scalable number crunching platform

complex data

visual insights-driven platform

full g...
big data in pharma (2 specific use cases)

outlier detection platform
neo4j, mongodb/cassandra and gephi

euretos - brain
...
gene expression clustering
➡ oncology data set:
★ 4.800 samples
★ 27.000 genes
➡ Question:
★ for a particular subset of sa...
storing gene expressions (mongodb)
{ "_id" : { "$oid" : "4f1fb64a1695629dd9d916e3"} ,
  "sample_name" : "122551hp133a21.ce...
correlating samples (mongodb/map-reduce)
x

pearson correlation

y

43

99

21

65

25

79

42

75

57

87

59

81

0,52
co-expression graph (neo4j)
122551

correlat
ed

6

create an edge between both nodes

8
value : 0,

➡ create a node for e...
co-expression visualisation (gephi)
euretos - brain
➡ pubmed: 23 million biomedical articles
1300 new ones added every day
• google-like search interface
•

➡...
euretos - brain

authors

references
euretos - brain

ooooooh crap ...
euretos - brain
➡ nanopub (nanopub.org)
•

the smallest unit of publishable information

➡ assertion
• subject: malaria
• ...
euretos - brain
➡ unfortunately, malaria is encoded in various ways ...
db1

db2

db3

malaria

P22384

AQ879

malaria
euretos - brain

malaria

transferred by

mosquito
euretos - brain
➡ brain (http://www.euretos.com/brain)
exploration and analysis platform
• millions of concepts/triples/na...
brain
brain
brain
brain
brain
brain
brain
brain
Questions?
datablend - continuum

Follow us

E-MAIL

twitter.com/data_blend
www.datablend.be

info@datablend.be

www.datablend.be

in...
Upcoming SlideShare
Loading in …5
×

The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

1,408 views

Published on

This talk will illustrate the power and flexibility of Graph Databases and Neo4j specifically to help in the overall analysis of biological data sets. Davy will show how to build a visual exploration environment that helps researchers at identifying clusters within various biological data sets, including gene expression and mutation prevalence data. Additionally, he will demo BRAIN (Bio Relations and Intelligence Network), a powerful data exploration platform that combines various scientific data sources (including Pubmed, Swissprot and Drugbank). It uses Neo4J under the cover to both store and enable powerful querying capabilities that provide key insights and deductions.

Published in: Technology, Health & Medicine
  • Be the first to comment

The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

  1. 1. GraphConnect
  2. 2. the power of graphs to analyze biological data
  3. 3. about me who am i ... ➡ big data architect @ datablend - continuum provide big data and nosql consultancy • 5 years of hands-on expertise in the pharma/biotech sector • Davy Suvee @DSUVEE
  4. 4. big data in pharma massive data scalable number crunching platform complex data visual insights-driven platform full genome sequencing biological networks graphs!!
  5. 5. big data in pharma (2 specific use cases) outlier detection platform neo4j, mongodb/cassandra and gephi euretos - brain neo4j, mongodb, solr and prefuse
  6. 6. gene expression clustering ➡ oncology data set: ★ 4.800 samples ★ 27.000 genes ➡ Question: ★ for a particular subset of samples, which genes are co-expressed?
  7. 7. storing gene expressions (mongodb) { "_id" : { "$oid" : "4f1fb64a1695629dd9d916e3"} ,   "sample_name" : "122551hp133a21.cel" ,   "genomics_id" : 122551 ,   "sample_id" : 343981 ,   "donor_id" : 143981 ,   "sample_type" : "Tissue" ,   "sample_site" : "Ascending colon" ,   "pathology_category" : "MALIGNANT" ,   "pathology_morphology" : "Adenocarcinoma" ,   "pathology_type" : "Primary malignant neoplasm of colon" ,   "primary_site" : "Colon" ,   "expressions" : [ { "gene" : "X1_at" , "expression" : 5.54217719084415} ,                     { "gene" : "X10_at" , "expression" : 3.92335121981739} ,                     { "gene" : "X100_at" , "expression" : 7.81638155662255} ,                     { "gene" : "X1000_at" , "expression" : 5.44318512260619} ,                      … ] }
  8. 8. correlating samples (mongodb/map-reduce) x pearson correlation y 43 99 21 65 25 79 42 75 57 87 59 81 0,52
  9. 9. co-expression graph (neo4j) 122551 correlat ed 6 create an edge between both nodes 8 value : 0, ➡ create a node for each sample ➡ if correlation between two samples >= 0.8 122553 122552
  10. 10. co-expression visualisation (gephi)
  11. 11. euretos - brain ➡ pubmed: 23 million biomedical articles 1300 new ones added every day • google-like search interface • ➡ reading an article ... • malaria is transferred by mosquitoes
  12. 12. euretos - brain authors references
  13. 13. euretos - brain ooooooh crap ...
  14. 14. euretos - brain ➡ nanopub (nanopub.org) • the smallest unit of publishable information ➡ assertion • subject: malaria • predicate: transferred by • object: mosquito ➡ provenance • how this came to be (meta-data)
  15. 15. euretos - brain ➡ unfortunately, malaria is encoded in various ways ... db1 db2 db3 malaria P22384 AQ879 malaria
  16. 16. euretos - brain malaria transferred by mosquito
  17. 17. euretos - brain ➡ brain (http://www.euretos.com/brain) exploration and analysis platform • millions of concepts/triples/nanopubs • pubmed, uniprot, omim, pubchem, ... • ➡ architectural stack • • • meta-data is stored in mongodb graph in neo4j swing interface connecting to rest endpoints
  18. 18. brain
  19. 19. brain
  20. 20. brain
  21. 21. brain
  22. 22. brain
  23. 23. brain
  24. 24. brain
  25. 25. brain
  26. 26. Questions?
  27. 27. datablend - continuum Follow us E-MAIL twitter.com/data_blend www.datablend.be info@datablend.be www.datablend.be info@datablend.be 0499/05.00.89

×