The power of graphs to analyze biological data

  • 1,372 views
Uploaded on

The power of graphs to analyze biological data

The power of graphs to analyze biological data

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,372
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
20
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. the power of graphs for analyzing biological datasets Davy Suvee Janssen Pharmaceutica
  • 2. about me who am i ... ➡ working as an it lead / software architect @ janssen pharmaceutica • dealing with big scientific data sets • hands-on expertise in big data and NoSQL technologies ➡ founder of datablend • provide big data and NoSQL consultancy Davy Suvee • share practical knowledge and big data use cases via blog @DSUVEE
  • 3. outline➡ getting visual insights into big data sets ★ gene expression clustering (mongodb, Neo4j, Gephi) ★ Mutation prevalence (cassandra, Neo4j, Gephi)➡ fluxgraph, a time machine for you graphs ...
  • 4. insights in big data➡ typical approach through warehousing ★ star schema with fact tables and dimension tables
  • 5. insights in big data➡ typical approach through warehousing ★ star schema with fact tables and dimension tables
  • 6. insights in big data ★ real-time visualization ★ filtering ★ metrics ★ layouting 1, 2 ★ modular1. http://gephi.org/plugins/neo4j-graph-database-support/ 2. http://github.com/datablend/gephi-blueprints-plugin
  • 7. gene expression clustering ➡ oncology data set: ★ 4.800 samples ★ 27.000 genes ➡ Question: ★ for a particular subset of samples, which genes are co-expressed?
  • 8. mongodb for storing gene expressions{ "_id" : { "$oid" : "4f1fb64a1695629dd9d916e3"} ,  "sample_name" : "122551hp133a21.cel" ,  "genomics_id" : 122551 ,  "sample_id" : 343981 ,  "donor_id" : 143981 ,  "sample_type" : "Tissue" ,  "sample_site" : "Ascending colon" ,  "pathology_category" : "MALIGNANT" ,  "pathology_morphology" : "Adenocarcinoma" ,  "pathology_type" : "Primary malignant neoplasm of colon" ,  "primary_site" : "Colon" ,  "expressions" : [ { "gene" : "X1_at" , "expression" : 5.54217719084415} ,                    { "gene" : "X10_at" , "expression" : 3.92335121981739} ,                    { "gene" : "X100_at" , "expression" : 7.81638155662255} ,                    { "gene" : "X1000_at" , "expression" : 5.44318512260619} ,                     … ]}
  • 9. pearson correlation through map-reduce x ypearson correlation 43 99 21 65 25 79 0,52 42 75 57 87 59 81
  • 10. co-expression graph➡ create a node for each gene➡ if correlation between two genes >= 0.8, draw an edge between both nodes
  • 11. co-expression graph
  • 12. graphs and time ...➡ reproducible graph state➡ towards a time-aware graph ...➡ fluxgraph: a blueprints-compatible graph on top of Datomic➡ make FluxGraph fully time-aware ★ travel your graph through time ★ time-scoped iteration of vertices and edges ★ temporal graph comparison
  • 13. travel through timeFluxGraph fg = new FluxGraph();
  • 14. travel through timeFluxGraph fg = new FluxGraph(); DavyVertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);
  • 15. travel through timeFluxGraph fg = new FluxGraph(); DavyVertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”); PeterVertex peter = ...
  • 16. travel through timeFluxGraph fg = new FluxGraph(); DavyVertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”); PeterVertex peter = ...Vertex michael = ... Michael
  • 17. travel through timeFluxGraph fg = new FluxGraph(); Davy kn owVertex davy = fg.addVertex(); sdavy.setProperty(“name”,”Davy”); PeterVertex peter = ...Vertex michael = ...Edge e1 = Michael fg.addEdge(davy, peter,“knows”);
  • 18. travel through time DavyDate checkpoint = new Date(); kn ow s Peter Michael
  • 19. travel through time DavyDate checkpoint = new Date(); kn ow sdavy.setProperty(“name”,”David”); Peter Michael
  • 20. travel through time DavidDate checkpoint = new Date(); kn ow sdavy.setProperty(“name”,”David”); Peter Michael
  • 21. travel through time DavidDate checkpoint = new Date(); kn ow sdavy.setProperty(“name”,”David”); Peter knEdge e2 = ow fg.addEdge(davy, michael,“knows”); s Michael
  • 22. travel through time by defaulttime kn Davy ow David Davy s kn ow checkpoint s current Peter Peter kn ow s Michael Michael
  • 23. travel through timetime kn Davy ow David Davy s kn ow checkpoint s current Peter Peter kn ow s Michael Michael fg.setCheckpointTime(checkpoint);
  • 24. time-scoped iteration t1 t2 t3 tcurrrent change change change Davy Davy’ Davy’’ Davy’’’ ➡ how to find the version of the vertex you are interested in?
  • 25. time-scoped iteration t1 t2 t3 tcurrrent next next next Davy Davy’ Davy’’ Davy’’’ previous previous previous
  • 26. time-scoped iteration t1 t2 t3 tcurrrent next next next Davy Davy’ Davy’’ Davy’’’ previous previous previousVertex previousDavy = davy.getPreviousVersion();
  • 27. time-scoped iteration t1 t2 t3 tcurrrent next next next Davy Davy’ Davy’’ Davy’’’ previous previous previous Vertex previousDavy = davy.getPreviousVersion();Iterable<Vertex> allDavy = davy.getNextVersions();
  • 28. time-scoped iteration t1 t2 t3 tcurrrent next next next Davy Davy’ Davy’’ Davy’’’ previous previous previous Vertex previousDavy = davy.getPreviousVersion(); Iterable<Vertex> allDavy = davy.getNextVersions();Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);
  • 29. time-scoped iteration t1 t2 t3 tcurrrent next next next Davy Davy’ Davy’’ Davy’’’ previous previous previous Vertex previousDavy = davy.getPreviousVersion(); Iterable<Vertex> allDavy = davy.getNextVersions();Iterable<Vertex> selDavy = davy.getPreviousVersions(filter); Interval valid = davy.getTimerInterval();
  • 30. time-scoped iteration➡ When does an element change?➡ vertex: ★ setting or removing a property ★ add or remove it from an edge ★ being removed
  • 31. time-scoped iteration➡ When does an element change?➡ vertex: ➡ edge: ★ setting or removing a property ★ setting or removing a property ★ add or remove it from an edge ★ being removed ★ being removed
  • 32. time-scoped iteration➡ When does an element change?➡ vertex: ➡ edge: ★ setting or removing a property ★ setting or removing a property ★ add or remove it from an edge ★ being removed ★ being removed➡ ... and each element is time-scoped!
  • 33. temporal graph comparisonDavidDavy Davy kn kn ow ow s s Peter what changed? Peterkn ow s Michael Michael current checkpoint
  • 34. temporal graph comparison➡ difference (A , B) = union (A , B) - B➡ ... as a (immutable) graph!
  • 35. temporal graph comparison➡ difference (A , B) = union (A , B) - B➡ ... as a (immutable) graph! David difference ( , )= kn ow s
  • 36. use case: longitudinal patient data t1 t2 t3 t4 t5 smoking smoking deathpatient patient patient patient patient cancer cancer
  • 37. use case: longitudinal patient data➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)
  • 38. use case: longitudinal patient data➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)➡ example analysis: ★ if a male patient is no longer smoking in 2005 ★ what are the chances of getting lung cancer in 2010, comparing patients that smoked before 2005 patients that never smoked
  • 39. use case: longitudinal patient data➡ get all male non-smokers in 2005fg.setCheckpointTime(new DateTime(2005,12,31).toDate());
  • 40. use case: longitudinal patient data➡ get all male non-smokers in 2005fg.setCheckpointTime(new DateTime(2005,12,31).toDate());Iterator<Vertex> males = fg.getVertices("gender", "male").iterator()
  • 41. use case: longitudinal patient data➡ get all male non-smokers in 2005fg.setCheckpointTime(new DateTime(2005,12,31).toDate());Iterator<Vertex> males = fg.getVertices("gender", "male").iterator()while (males.hasNext()) { Vertex p2005 = males.next(); boolean smoking2005 = p2005.getEdges(OUT,"smokingStatus").iterator().hasNext();}
  • 42. use case: longitudinal patient data➡ which patients were smoking before 2005?boolean smokingBefore2005 = ((FluxVertex)p2005).getPreviousVersions(new TimeAwareFilter() { public TimeAwareElement filter(TimeAwareVertex element) { return element.getEdges(OUT, "smokingStatus").iterator().hasNext() ? element : null; } }).iterator().hasNext();
  • 43. use case: longitudinal patient data➡ which patients have cancer in 2010 working set of smokers Graph g = fg.difference(smokerws, time2010.toDate(), time2005.toDate());
  • 44. use case: longitudinal patient data➡ which patients have cancer in 2010 working set of smokers Graph g = fg.difference(smokerws, time2010.toDate(), time2005.toDate());➡ extract the patients that have an edge to the cancer node
  • 45. Questions?