Successfully reported this slideshow.

The power of graphs to analyze biological data

2,909 views

Published on

The power of graphs to analyze biological data

Published in: Technology
  • Be the first to comment

The power of graphs to analyze biological data

  1. 1. the power of graphs for analyzing biological datasets Davy Suvee Janssen Pharmaceutica
  2. 2. about me who am i ... ➡ working as an it lead / software architect @ janssen pharmaceutica • dealing with big scientific data sets • hands-on expertise in big data and NoSQL technologies ➡ founder of datablend • provide big data and NoSQL consultancy Davy Suvee • share practical knowledge and big data use cases via blog @DSUVEE
  3. 3. outline➡ getting visual insights into big data sets ★ gene expression clustering (mongodb, Neo4j, Gephi) ★ Mutation prevalence (cassandra, Neo4j, Gephi)➡ fluxgraph, a time machine for you graphs ...
  4. 4. insights in big data➡ typical approach through warehousing ★ star schema with fact tables and dimension tables
  5. 5. insights in big data➡ typical approach through warehousing ★ star schema with fact tables and dimension tables
  6. 6. insights in big data ★ real-time visualization ★ filtering ★ metrics ★ layouting 1, 2 ★ modular1. http://gephi.org/plugins/neo4j-graph-database-support/ 2. http://github.com/datablend/gephi-blueprints-plugin
  7. 7. gene expression clustering ➡ oncology data set: ★ 4.800 samples ★ 27.000 genes ➡ Question: ★ for a particular subset of samples, which genes are co-expressed?
  8. 8. mongodb for storing gene expressions{ "_id" : { "$oid" : "4f1fb64a1695629dd9d916e3"} ,  "sample_name" : "122551hp133a21.cel" ,  "genomics_id" : 122551 ,  "sample_id" : 343981 ,  "donor_id" : 143981 ,  "sample_type" : "Tissue" ,  "sample_site" : "Ascending colon" ,  "pathology_category" : "MALIGNANT" ,  "pathology_morphology" : "Adenocarcinoma" ,  "pathology_type" : "Primary malignant neoplasm of colon" ,  "primary_site" : "Colon" ,  "expressions" : [ { "gene" : "X1_at" , "expression" : 5.54217719084415} ,                    { "gene" : "X10_at" , "expression" : 3.92335121981739} ,                    { "gene" : "X100_at" , "expression" : 7.81638155662255} ,                    { "gene" : "X1000_at" , "expression" : 5.44318512260619} ,                     … ]}
  9. 9. pearson correlation through map-reduce x ypearson correlation 43 99 21 65 25 79 0,52 42 75 57 87 59 81
  10. 10. co-expression graph➡ create a node for each gene➡ if correlation between two genes >= 0.8, draw an edge between both nodes
  11. 11. co-expression graph
  12. 12. graphs and time ...➡ reproducible graph state➡ towards a time-aware graph ...➡ fluxgraph: a blueprints-compatible graph on top of Datomic➡ make FluxGraph fully time-aware ★ travel your graph through time ★ time-scoped iteration of vertices and edges ★ temporal graph comparison
  13. 13. travel through timeFluxGraph fg = new FluxGraph();
  14. 14. travel through timeFluxGraph fg = new FluxGraph(); DavyVertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);
  15. 15. travel through timeFluxGraph fg = new FluxGraph(); DavyVertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”); PeterVertex peter = ...
  16. 16. travel through timeFluxGraph fg = new FluxGraph(); DavyVertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”); PeterVertex peter = ...Vertex michael = ... Michael
  17. 17. travel through timeFluxGraph fg = new FluxGraph(); Davy kn owVertex davy = fg.addVertex(); sdavy.setProperty(“name”,”Davy”); PeterVertex peter = ...Vertex michael = ...Edge e1 = Michael fg.addEdge(davy, peter,“knows”);
  18. 18. travel through time DavyDate checkpoint = new Date(); kn ow s Peter Michael
  19. 19. travel through time DavyDate checkpoint = new Date(); kn ow sdavy.setProperty(“name”,”David”); Peter Michael
  20. 20. travel through time DavidDate checkpoint = new Date(); kn ow sdavy.setProperty(“name”,”David”); Peter Michael
  21. 21. travel through time DavidDate checkpoint = new Date(); kn ow sdavy.setProperty(“name”,”David”); Peter knEdge e2 = ow fg.addEdge(davy, michael,“knows”); s Michael
  22. 22. travel through time by defaulttime kn Davy ow David Davy s kn ow checkpoint s current Peter Peter kn ow s Michael Michael
  23. 23. travel through timetime kn Davy ow David Davy s kn ow checkpoint s current Peter Peter kn ow s Michael Michael fg.setCheckpointTime(checkpoint);
  24. 24. time-scoped iteration t1 t2 t3 tcurrrent change change change Davy Davy’ Davy’’ Davy’’’ ➡ how to find the version of the vertex you are interested in?
  25. 25. time-scoped iteration t1 t2 t3 tcurrrent next next next Davy Davy’ Davy’’ Davy’’’ previous previous previous
  26. 26. time-scoped iteration t1 t2 t3 tcurrrent next next next Davy Davy’ Davy’’ Davy’’’ previous previous previousVertex previousDavy = davy.getPreviousVersion();
  27. 27. time-scoped iteration t1 t2 t3 tcurrrent next next next Davy Davy’ Davy’’ Davy’’’ previous previous previous Vertex previousDavy = davy.getPreviousVersion();Iterable<Vertex> allDavy = davy.getNextVersions();
  28. 28. time-scoped iteration t1 t2 t3 tcurrrent next next next Davy Davy’ Davy’’ Davy’’’ previous previous previous Vertex previousDavy = davy.getPreviousVersion(); Iterable<Vertex> allDavy = davy.getNextVersions();Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);
  29. 29. time-scoped iteration t1 t2 t3 tcurrrent next next next Davy Davy’ Davy’’ Davy’’’ previous previous previous Vertex previousDavy = davy.getPreviousVersion(); Iterable<Vertex> allDavy = davy.getNextVersions();Iterable<Vertex> selDavy = davy.getPreviousVersions(filter); Interval valid = davy.getTimerInterval();
  30. 30. time-scoped iteration➡ When does an element change?➡ vertex: ★ setting or removing a property ★ add or remove it from an edge ★ being removed
  31. 31. time-scoped iteration➡ When does an element change?➡ vertex: ➡ edge: ★ setting or removing a property ★ setting or removing a property ★ add or remove it from an edge ★ being removed ★ being removed
  32. 32. time-scoped iteration➡ When does an element change?➡ vertex: ➡ edge: ★ setting or removing a property ★ setting or removing a property ★ add or remove it from an edge ★ being removed ★ being removed➡ ... and each element is time-scoped!
  33. 33. temporal graph comparisonDavidDavy Davy kn kn ow ow s s Peter what changed? Peterkn ow s Michael Michael current checkpoint
  34. 34. temporal graph comparison➡ difference (A , B) = union (A , B) - B➡ ... as a (immutable) graph!
  35. 35. temporal graph comparison➡ difference (A , B) = union (A , B) - B➡ ... as a (immutable) graph! David difference ( , )= kn ow s
  36. 36. use case: longitudinal patient data t1 t2 t3 t4 t5 smoking smoking deathpatient patient patient patient patient cancer cancer
  37. 37. use case: longitudinal patient data➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)
  38. 38. use case: longitudinal patient data➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)➡ example analysis: ★ if a male patient is no longer smoking in 2005 ★ what are the chances of getting lung cancer in 2010, comparing patients that smoked before 2005 patients that never smoked
  39. 39. use case: longitudinal patient data➡ get all male non-smokers in 2005fg.setCheckpointTime(new DateTime(2005,12,31).toDate());
  40. 40. use case: longitudinal patient data➡ get all male non-smokers in 2005fg.setCheckpointTime(new DateTime(2005,12,31).toDate());Iterator<Vertex> males = fg.getVertices("gender", "male").iterator()
  41. 41. use case: longitudinal patient data➡ get all male non-smokers in 2005fg.setCheckpointTime(new DateTime(2005,12,31).toDate());Iterator<Vertex> males = fg.getVertices("gender", "male").iterator()while (males.hasNext()) { Vertex p2005 = males.next(); boolean smoking2005 = p2005.getEdges(OUT,"smokingStatus").iterator().hasNext();}
  42. 42. use case: longitudinal patient data➡ which patients were smoking before 2005?boolean smokingBefore2005 = ((FluxVertex)p2005).getPreviousVersions(new TimeAwareFilter() { public TimeAwareElement filter(TimeAwareVertex element) { return element.getEdges(OUT, "smokingStatus").iterator().hasNext() ? element : null; } }).iterator().hasNext();
  43. 43. use case: longitudinal patient data➡ which patients have cancer in 2010 working set of smokers Graph g = fg.difference(smokerws, time2010.toDate(), time2005.toDate());
  44. 44. use case: longitudinal patient data➡ which patients have cancer in 2010 working set of smokers Graph g = fg.difference(smokerws, time2010.toDate(), time2005.toDate());➡ extract the patients that have an edge to the cancer node
  45. 45. Questions?

×