Your SlideShare is downloading. ×
0
the power of graphs for analyzing biological datasets                       Davy Suvee                    Janssen Pharmace...
about me                 who am i ...                 ➡ working as an it lead / software architect @ janssen pharmaceutica...
outline➡ getting visual insights into big data sets  ★ gene expression clustering (mongodb, Neo4j, Gephi)  ★ Mutation prev...
insights in big data➡ typical approach through warehousing  ★ star schema with fact tables and dimension tables
insights in big data➡ typical approach through warehousing  ★ star schema with fact tables and dimension tables
insights in big data                                                                                                      ...
gene expression clustering                        ➡ oncology data set:                          ★ 4.800 samples           ...
mongodb for storing gene expressions{ "_id" : { "$oid" : "4f1fb64a1695629dd9d916e3"} ,  "sample_name" : "122551hp133a21.ce...
pearson correlation through map-reduce                         x   ypearson correlation     43   99                       ...
co-expression graph➡ create a node for each gene➡ if correlation between two genes >= 0.8, draw an edge between both nodes
co-expression graph
graphs and time ...➡ reproducible graph state➡ towards a time-aware graph ...➡ fluxgraph: a blueprints-compatible graph on...
travel through timeFluxGraph fg = new FluxGraph();
travel through timeFluxGraph fg = new FluxGraph();                                   DavyVertex davy = fg.addVertex();davy...
travel through timeFluxGraph fg = new FluxGraph();                                   DavyVertex davy = fg.addVertex();davy...
travel through timeFluxGraph fg = new FluxGraph();                                   DavyVertex davy = fg.addVertex();davy...
travel through timeFluxGraph fg = new FluxGraph();                                     Davy                               ...
travel through time                                DavyDate checkpoint = new Date();                                      ...
travel through time                                    DavyDate checkpoint = new Date();                                  ...
travel through time                                    DavidDate checkpoint = new Date();                                 ...
travel through time                                       DavidDate checkpoint = new Date();                              ...
travel through time                                           by defaulttime                        kn       Davy         ...
travel through timetime                         kn       Davy                   ow                            David       ...
time-scoped iteration         t1               t2               t3                 tcurrrent              change          ...
time-scoped iteration      t1                 t2                 t3                   tcurrrent             next          ...
time-scoped iteration       t1                 t2                 t3                   tcurrrent              next        ...
time-scoped iteration         t1                 t2                 t3                   tcurrrent                next    ...
time-scoped iteration            t1                 t2                 t3                   tcurrrent                   ne...
time-scoped iteration            t1                 t2                 t3                   tcurrrent                   ne...
time-scoped iteration➡ When does an element change?➡ vertex:   ★ setting or removing a property   ★ add or remove it from ...
time-scoped iteration➡ When does an element change?➡ vertex:                             ➡ edge:   ★ setting or removing a...
time-scoped iteration➡ When does an element change?➡ vertex:                                ➡ edge:   ★ setting or removin...
temporal graph comparisonDavidDavy                                          Davy                                          ...
temporal graph comparison➡ difference (A , B) = union (A , B) - B➡ ... as a (immutable) graph!
temporal graph comparison➡ difference (A , B) = union (A , B) - B➡ ... as a (immutable) graph!                   David  di...
use case: longitudinal patient data    t1        t2        t3        t4        t5          smoking   smoking             d...
use case: longitudinal patient data➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)
use case: longitudinal patient data➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)➡ example a...
use case: longitudinal patient data➡ get all male non-smokers in 2005fg.setCheckpointTime(new DateTime(2005,12,31).toDate(...
use case: longitudinal patient data➡ get all male non-smokers in 2005fg.setCheckpointTime(new DateTime(2005,12,31).toDate(...
use case: longitudinal patient data➡ get all male non-smokers in 2005fg.setCheckpointTime(new DateTime(2005,12,31).toDate(...
use case: longitudinal patient data➡ which patients were smoking before 2005?boolean smokingBefore2005 =  ((FluxVertex)p20...
use case: longitudinal patient data➡ which patients have cancer in 2010                                       working set ...
use case: longitudinal patient data➡ which patients have cancer in 2010                                       working set ...
Questions?
Upcoming SlideShare
Loading in...5
×

The power of graphs to analyze biological data

1,607

Published on

The power of graphs to analyze biological data

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,607
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
22
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "The power of graphs to analyze biological data"

  1. 1. the power of graphs for analyzing biological datasets Davy Suvee Janssen Pharmaceutica
  2. 2. about me who am i ... ➡ working as an it lead / software architect @ janssen pharmaceutica • dealing with big scientific data sets • hands-on expertise in big data and NoSQL technologies ➡ founder of datablend • provide big data and NoSQL consultancy Davy Suvee • share practical knowledge and big data use cases via blog @DSUVEE
  3. 3. outline➡ getting visual insights into big data sets ★ gene expression clustering (mongodb, Neo4j, Gephi) ★ Mutation prevalence (cassandra, Neo4j, Gephi)➡ fluxgraph, a time machine for you graphs ...
  4. 4. insights in big data➡ typical approach through warehousing ★ star schema with fact tables and dimension tables
  5. 5. insights in big data➡ typical approach through warehousing ★ star schema with fact tables and dimension tables
  6. 6. insights in big data ★ real-time visualization ★ filtering ★ metrics ★ layouting 1, 2 ★ modular1. http://gephi.org/plugins/neo4j-graph-database-support/ 2. http://github.com/datablend/gephi-blueprints-plugin
  7. 7. gene expression clustering ➡ oncology data set: ★ 4.800 samples ★ 27.000 genes ➡ Question: ★ for a particular subset of samples, which genes are co-expressed?
  8. 8. mongodb for storing gene expressions{ "_id" : { "$oid" : "4f1fb64a1695629dd9d916e3"} ,  "sample_name" : "122551hp133a21.cel" ,  "genomics_id" : 122551 ,  "sample_id" : 343981 ,  "donor_id" : 143981 ,  "sample_type" : "Tissue" ,  "sample_site" : "Ascending colon" ,  "pathology_category" : "MALIGNANT" ,  "pathology_morphology" : "Adenocarcinoma" ,  "pathology_type" : "Primary malignant neoplasm of colon" ,  "primary_site" : "Colon" ,  "expressions" : [ { "gene" : "X1_at" , "expression" : 5.54217719084415} ,                    { "gene" : "X10_at" , "expression" : 3.92335121981739} ,                    { "gene" : "X100_at" , "expression" : 7.81638155662255} ,                    { "gene" : "X1000_at" , "expression" : 5.44318512260619} ,                     … ]}
  9. 9. pearson correlation through map-reduce x ypearson correlation 43 99 21 65 25 79 0,52 42 75 57 87 59 81
  10. 10. co-expression graph➡ create a node for each gene➡ if correlation between two genes >= 0.8, draw an edge between both nodes
  11. 11. co-expression graph
  12. 12. graphs and time ...➡ reproducible graph state➡ towards a time-aware graph ...➡ fluxgraph: a blueprints-compatible graph on top of Datomic➡ make FluxGraph fully time-aware ★ travel your graph through time ★ time-scoped iteration of vertices and edges ★ temporal graph comparison
  13. 13. travel through timeFluxGraph fg = new FluxGraph();
  14. 14. travel through timeFluxGraph fg = new FluxGraph(); DavyVertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);
  15. 15. travel through timeFluxGraph fg = new FluxGraph(); DavyVertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”); PeterVertex peter = ...
  16. 16. travel through timeFluxGraph fg = new FluxGraph(); DavyVertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”); PeterVertex peter = ...Vertex michael = ... Michael
  17. 17. travel through timeFluxGraph fg = new FluxGraph(); Davy kn owVertex davy = fg.addVertex(); sdavy.setProperty(“name”,”Davy”); PeterVertex peter = ...Vertex michael = ...Edge e1 = Michael fg.addEdge(davy, peter,“knows”);
  18. 18. travel through time DavyDate checkpoint = new Date(); kn ow s Peter Michael
  19. 19. travel through time DavyDate checkpoint = new Date(); kn ow sdavy.setProperty(“name”,”David”); Peter Michael
  20. 20. travel through time DavidDate checkpoint = new Date(); kn ow sdavy.setProperty(“name”,”David”); Peter Michael
  21. 21. travel through time DavidDate checkpoint = new Date(); kn ow sdavy.setProperty(“name”,”David”); Peter knEdge e2 = ow fg.addEdge(davy, michael,“knows”); s Michael
  22. 22. travel through time by defaulttime kn Davy ow David Davy s kn ow checkpoint s current Peter Peter kn ow s Michael Michael
  23. 23. travel through timetime kn Davy ow David Davy s kn ow checkpoint s current Peter Peter kn ow s Michael Michael fg.setCheckpointTime(checkpoint);
  24. 24. time-scoped iteration t1 t2 t3 tcurrrent change change change Davy Davy’ Davy’’ Davy’’’ ➡ how to find the version of the vertex you are interested in?
  25. 25. time-scoped iteration t1 t2 t3 tcurrrent next next next Davy Davy’ Davy’’ Davy’’’ previous previous previous
  26. 26. time-scoped iteration t1 t2 t3 tcurrrent next next next Davy Davy’ Davy’’ Davy’’’ previous previous previousVertex previousDavy = davy.getPreviousVersion();
  27. 27. time-scoped iteration t1 t2 t3 tcurrrent next next next Davy Davy’ Davy’’ Davy’’’ previous previous previous Vertex previousDavy = davy.getPreviousVersion();Iterable<Vertex> allDavy = davy.getNextVersions();
  28. 28. time-scoped iteration t1 t2 t3 tcurrrent next next next Davy Davy’ Davy’’ Davy’’’ previous previous previous Vertex previousDavy = davy.getPreviousVersion(); Iterable<Vertex> allDavy = davy.getNextVersions();Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);
  29. 29. time-scoped iteration t1 t2 t3 tcurrrent next next next Davy Davy’ Davy’’ Davy’’’ previous previous previous Vertex previousDavy = davy.getPreviousVersion(); Iterable<Vertex> allDavy = davy.getNextVersions();Iterable<Vertex> selDavy = davy.getPreviousVersions(filter); Interval valid = davy.getTimerInterval();
  30. 30. time-scoped iteration➡ When does an element change?➡ vertex: ★ setting or removing a property ★ add or remove it from an edge ★ being removed
  31. 31. time-scoped iteration➡ When does an element change?➡ vertex: ➡ edge: ★ setting or removing a property ★ setting or removing a property ★ add or remove it from an edge ★ being removed ★ being removed
  32. 32. time-scoped iteration➡ When does an element change?➡ vertex: ➡ edge: ★ setting or removing a property ★ setting or removing a property ★ add or remove it from an edge ★ being removed ★ being removed➡ ... and each element is time-scoped!
  33. 33. temporal graph comparisonDavidDavy Davy kn kn ow ow s s Peter what changed? Peterkn ow s Michael Michael current checkpoint
  34. 34. temporal graph comparison➡ difference (A , B) = union (A , B) - B➡ ... as a (immutable) graph!
  35. 35. temporal graph comparison➡ difference (A , B) = union (A , B) - B➡ ... as a (immutable) graph! David difference ( , )= kn ow s
  36. 36. use case: longitudinal patient data t1 t2 t3 t4 t5 smoking smoking deathpatient patient patient patient patient cancer cancer
  37. 37. use case: longitudinal patient data➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)
  38. 38. use case: longitudinal patient data➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)➡ example analysis: ★ if a male patient is no longer smoking in 2005 ★ what are the chances of getting lung cancer in 2010, comparing patients that smoked before 2005 patients that never smoked
  39. 39. use case: longitudinal patient data➡ get all male non-smokers in 2005fg.setCheckpointTime(new DateTime(2005,12,31).toDate());
  40. 40. use case: longitudinal patient data➡ get all male non-smokers in 2005fg.setCheckpointTime(new DateTime(2005,12,31).toDate());Iterator<Vertex> males = fg.getVertices("gender", "male").iterator()
  41. 41. use case: longitudinal patient data➡ get all male non-smokers in 2005fg.setCheckpointTime(new DateTime(2005,12,31).toDate());Iterator<Vertex> males = fg.getVertices("gender", "male").iterator()while (males.hasNext()) { Vertex p2005 = males.next(); boolean smoking2005 = p2005.getEdges(OUT,"smokingStatus").iterator().hasNext();}
  42. 42. use case: longitudinal patient data➡ which patients were smoking before 2005?boolean smokingBefore2005 = ((FluxVertex)p2005).getPreviousVersions(new TimeAwareFilter() { public TimeAwareElement filter(TimeAwareVertex element) { return element.getEdges(OUT, "smokingStatus").iterator().hasNext() ? element : null; } }).iterator().hasNext();
  43. 43. use case: longitudinal patient data➡ which patients have cancer in 2010 working set of smokers Graph g = fg.difference(smokerws, time2010.toDate(), time2005.toDate());
  44. 44. use case: longitudinal patient data➡ which patients have cancer in 2010 working set of smokers Graph g = fg.difference(smokerws, time2010.toDate(), time2005.toDate());➡ extract the patients that have an edge to the cancer node
  45. 45. Questions?
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×