Thursday23.5graphdatabases
aboutmewhoami...DavySuvee@DSUVEE➡ bigdataarchitect@datablend-continuum• providebigdataandnosqlconsultancy• sharepracticalk...
BigData2-3yearsago...
Nowadays...BigData
Whatisbigdata......largeandcomplexdatasetsthataredifficulttoprocesswithtraditionaldatabasemanagementtools...
Whatisbigdata...BigData...largeandcomplexdatasetsthataredifficulttoprocesswithtraditionaldatabasemanagementtools...➡ store...
Volume Variety VelocityDataexceedsthelimitsofverticallyscalabletoolsrequiringnovelstoragesolutionsDatatakesdifferentformat...
Tacklingthevolumeproblem...➡ Throwingourdataaway :-(Whatwearecurrentlydoing...➡ Storingpreprocesseddata :-/➡ Trytostoreita...
Tacklingthevolumeproblem...VerticalScaling€Your database
Tacklingthevolumeproblem...VerticalScaling€2Your database
Tacklingthevolumeproblem...VerticalScaling€3Your database
Tacklingthevolumeproblem...VerticalScaling€4Your database
Tacklingthevolumeproblem...VerticalScaling€4HorizontalScaling€ x #nodesYour databaseNoSQL
Tacklingthevarietyproblem...VideoAudioSocialstreamsLogfilesTextMassiveUnstuctured
Tacklingthevarietyproblem...One,schema-structuredmodel Best-fit,schema-lessmodelYour databaseNoSQLKey-ValueDatabasesDocume...
Tacklingthevelocityproblem...➡ CollectWewantto ...➡ Process➡ Queryin Real-TimeMASSIVEamountsofUnstructured data➡ Analyze
Tacklingthevelocityproblem...Slowandoutdatedinformation FastandrealtimeYour stackNoSQL &Big DataBIETLAPPSYNCSYNCAPPMap-Red...
graphsareeverywhere...
alittlebitofgraphtheory...Davyage = 33Datablendbtw = 123...node/vertexJanssensector = pharmaKimage = 26gender = Fedgefound...
Advantages...?➡ whiteboardfriendly➡ schema-less➡ index-freeadjacency(nojoins!)GraphDatabase➡ queriesastraversals➡ queriesa...
Advantages...?
Products/projects...?➡ databases:neo4j,orientdb,allegrograph,dex,...➡ processing:pregel,giraph,hama,goldenorb,...➡ APIs:bl...
Graphdatabase101 (neo4j)GraphDatabaseService graph = ...Node davy = graph.createNode();davy.setProperty(“name”,”Davy”);Dav...
Graphdatabase101 (neo4j)enum RelTypes implements RelationshipType{KNOWS, WORKED_FOR, FOUNDED}DavyKimknowsRelationship davy...
Graphdatabase101 (neo4j)Relationship davy_datablend =davy.createRelationshipTo(datablend, RelTypes.FOUNDED)davy_datablend....
Graphdatabase101 (neo4j)Index<Node> nodeIndex =graph.index().forNodes(“nodes”);Node datablend = graph.createNode();datable...
Graphdatabase101 (neo4j)➡ findfriendsofmyfriends...TraversalDescription td =Traversal.description()          .breadthFirst...
Graphdatabase101 (neo4j)➡ findfriendsofmyfriends...START davy=node:node_auto_index(name = “Davy”)MATCH davy-[:KNOWS]->()-[...
Usecases...?➡ recommendations➡ accesscontrol➡ routingGraphDatabase ➡ socialcomputing/networks➡ genealogy
insightsinbigdata➡ typicalapproachthroughwarehousing★ starschemawithfacttablesanddimensiontables
insightsinbigdata➡ typicalapproachthroughwarehousing★ starschemawithfacttablesanddimensiontables
insightsinbigdata➡ typicalapproachthroughwarehousing★ starschemawithfacttablesanddimensiontables
insightsinbigdata★ real-timevisualization★ filtering★ metrics★ layouting★ modular1,21.http://gephi.org/plugins/neo4j-graph...
geneexpressionclustering★ 4.800samples★ 27.000genes➡ oncologydataset:➡ Question:★ for a particular subset of samples,which...
mongodbforstoringgeneexpressions{ "_id" : { "$oid" : "4f1fb64a1695629dd9d916e3"} ,  "sample_name" : "122551hp133a21.cel" ,...
pearsoncorrelationthroughmap-reducepearson correlationx y43 9921 6525 7942 7557 8759 810,52
co-expressiongraph➡ createanodeforeachgene➡ ifcorrelationbetweentwogenes>=0.8,drawanedgebetweenbothnodes
co-expressiongraph
mutationprevalence
mutationprevalence
mutationprevalence
mutationprevalence
analyzingrunningdata<trkpt lon="4.723870977759361" lat="51.075748661533">    <ele>29.799999237060547</ele>    <time>2011-1...
analyzingrunningdatathroughneo4j➡ usingneo4jspatialextension➡ createanodeforeachtrackedpointList<GeoPipeFlow> closests =Ge...
analyzingrunningdata
analyzinggoogleanalyticsdata➡ sourceurl->targeturl
graphsandtime...➡ fluxgraph:ablueprints-compatiblegraphontopofDatomic➡ makeFluxGraphfullytime-aware★ travelyourgraphthroug...
travelthroughtimeFluxGraph fg = new FluxGraph();
travelthroughtimeFluxGraph fg = new FluxGraph();Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);Davy
travelthroughtimeFluxGraph fg = new FluxGraph();Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);DavyKimVertex...
travelthroughtimeFluxGraph fg = new FluxGraph();Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);PeterDavyKimV...
travelthroughtimeFluxGraph fg = new FluxGraph();Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);PeterDavyKimV...
travelthroughtimePeterDavyKimknows
travelthroughtimeDate checkpoint = new Date();PeterDavyKimknows
travelthroughtimeDate checkpoint = new Date();davy.setProperty(“name”,”David”);PeterDavyKimknows
travelthroughtimeDate checkpoint = new Date();davy.setProperty(“name”,”David”);PeterKimknowsDavid
travelthroughtimeDate checkpoint = new Date();davy.setProperty(“name”,”David”);PeterKimEdge e2 =fg.addEdge(davy, peter, “k...
travelthroughtimePeterDavyKimDavidDavyKimknowsknowsPeterknowscheckpointcurrenttimeby default
travelthroughtimePeterDavyKimDavidDavyKimknowsknowsPeterknowscheckpointcurrenttimefg.setCheckpointTime(checkpoint);
travelthroughtimePeterDavyKimDavidDavyKimknowsknowsPeterknowscheckpointcurrenttimefg.setCheckpointTime(checkpoint);
tcurrrentt3t2time-scopediterationchange change changeDavy’’’Davy’ Davy’’t1Davy➡howtofindtheversionofthevertexyouareinteres...
tcurrrentt3t2time-scopediterationDavy’’’Davy’ Davy’’t1Davy
next next nextpreviouspreviousprevioustcurrrentt3t2time-scopediterationDavy’’’Davy’ Davy’’t1Davy
next next nextpreviouspreviousprevioustcurrrentt3t2time-scopediterationDavy’’’Davy’ Davy’’t1DavyVertex previousDavy = davy...
next next nextpreviouspreviousprevioustcurrrentt3t2time-scopediterationDavy’’’Davy’ Davy’’t1DavyVertex previousDavy = davy...
next next nextpreviouspreviousprevioustcurrrentt3t2time-scopediterationDavy’’’Davy’ Davy’’t1DavyVertex previousDavy = davy...
next next nextpreviouspreviousprevioustcurrrentt3t2time-scopediterationDavy’’’Davy’ Davy’’t1DavyVertex previousDavy = davy...
PeterPeterDavyKimDavid DavyKimtemporalgraphcomparisonknowsknowsknowscurrent checkpointwhatchanged?
temporalgraphcomparison➡difference(A,B) = union(A,B)-B➡...asa(immutable)graph!difference ( , ) =Davidknows
t3t2t1usecase:longitudinalpatientdatapatient patientsmokingpatientsmokingt4patientcancert5patientcancerdeath
usecase:longitudinalpatientdata➡ historicaldatafor15.000patientsoveraperiodof10years(2001-2010)➡ exampleanalysis:★ ifamale...
FluxGraphhttp://github.com/datablend/fluxgraph➡availableongithub
OpenInnovationNetworkingTool➡ Manydifferentprojects,manydifferentpartners,manydifferentdomains...★ howdowekeeptrack?★ howc...
OpenInnovationNetworkingTool
OpenInnovationNetworkingTool
Moregraphs...➡ pharma➡ geospatial➡ dependencyanalysis➡ ontology➡ ...
Questions?
E-MAILinfo@datablend.beFollowustwitter.com/data_blendwww.datablend.bewww.datablend.be info@datablend.be 0499/05.00.89datab...
Introduction to Graph Databases @ SAI
Upcoming SlideShare
Loading in …5
×

Introduction to Graph Databases @ SAI

1,369 views

Published on

Introduction to Graph Databases @ SAI

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,369
On SlideShare
0
From Embeds
0
Number of Embeds
28
Actions
Shares
0
Downloads
22
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Introduction to Graph Databases @ SAI

  1. 1. Thursday23.5graphdatabases
  2. 2. aboutmewhoami...DavySuvee@DSUVEE➡ bigdataarchitect@datablend-continuum• providebigdataandnosqlconsultancy• sharepracticalknowledgeandbigdatausecasesviablog
  3. 3. BigData2-3yearsago...
  4. 4. Nowadays...BigData
  5. 5. Whatisbigdata......largeandcomplexdatasetsthataredifficulttoprocesswithtraditionaldatabasemanagementtools...
  6. 6. Whatisbigdata...BigData...largeandcomplexdatasetsthataredifficulttoprocesswithtraditionaldatabasemanagementtools...➡ store (nosql)➡ enrich (datamining,ml,nlp,...)➡ visualize (d3,gephi,mapbox,tableau,...)➡ process/analyze (map/reduce,cep,storm,...)
  7. 7. Volume Variety VelocityDataexceedsthelimitsofverticallyscalabletoolsrequiringnovelstoragesolutionsDatatakesdifferentformatsthatmakeintegrationcomplexandexpensiveDataanalysistimewindowsaresmallcomparedtothespeedofdataacquistionTheworldhaschanged...
  8. 8. Tacklingthevolumeproblem...➡ Throwingourdataaway :-(Whatwearecurrentlydoing...➡ Storingpreprocesseddata :-/➡ Trytostoreitanyway ;-(Butwhy?
  9. 9. Tacklingthevolumeproblem...VerticalScaling€Your database
  10. 10. Tacklingthevolumeproblem...VerticalScaling€2Your database
  11. 11. Tacklingthevolumeproblem...VerticalScaling€3Your database
  12. 12. Tacklingthevolumeproblem...VerticalScaling€4Your database
  13. 13. Tacklingthevolumeproblem...VerticalScaling€4HorizontalScaling€ x #nodesYour databaseNoSQL
  14. 14. Tacklingthevarietyproblem...VideoAudioSocialstreamsLogfilesTextMassiveUnstuctured
  15. 15. Tacklingthevarietyproblem...One,schema-structuredmodel Best-fit,schema-lessmodelYour databaseNoSQLKey-ValueDatabasesDocument-BasedDatabasesGraphDatabasesWide-columnDatabasesASIS...
  16. 16. Tacklingthevelocityproblem...➡ CollectWewantto ...➡ Process➡ Queryin Real-TimeMASSIVEamountsofUnstructured data➡ Analyze
  17. 17. Tacklingthevelocityproblem...Slowandoutdatedinformation FastandrealtimeYour stackNoSQL &Big DataBIETLAPPSYNCSYNCAPPMap-ReduceBI(+ANALYTICS)
  18. 18. graphsareeverywhere...
  19. 19. alittlebitofgraphtheory...Davyage = 33Datablendbtw = 123...node/vertexJanssensector = pharmaKimage = 26gender = Fedgefoundedin: 2011worked_forfrom: 2008to: 2013knowssince: 2013
  20. 20. Advantages...?➡ whiteboardfriendly➡ schema-less➡ index-freeadjacency(nojoins!)GraphDatabase➡ queriesastraversals➡ queriesaspatternmatching
  21. 21. Advantages...?
  22. 22. Products/projects...?➡ databases:neo4j,orientdb,allegrograph,dex,...➡ processing:pregel,giraph,hama,goldenorb,...➡ APIs:blueprintsGraphDatabase ➡ querylanguages:gremlin,cypher,sparql
  23. 23. Graphdatabase101 (neo4j)GraphDatabaseService graph = ...Node davy = graph.createNode();davy.setProperty(“name”,”Davy”);DavyKimNode kim = graph.createNode();kim.setProperty(“name”,”Kim”);
  24. 24. Graphdatabase101 (neo4j)enum RelTypes implements RelationshipType{KNOWS, WORKED_FOR, FOUNDED}DavyKimknowsRelationship davy_kim =davy.createRelationshipTo(kim, RelTypes.KNOWS)davy_kim.setProperty(“since”, 2013);
  25. 25. Graphdatabase101 (neo4j)Relationship davy_datablend =davy.createRelationshipTo(datablend, RelTypes.FOUNDED)davy_datablend.setProperty(“in”, 2011);DavyDatablendfounded➡ howtoaccessthedatablend node?
  26. 26. Graphdatabase101 (neo4j)Index<Node> nodeIndex =graph.index().forNodes(“nodes”);Node datablend = graph.createNode();datablend.setProperty(“name”,”Datablend”);nodeIndex.add(datablend, “name”, “Datablend”);Node found = nodeIndex.get(“name”,”Datablend”).getSingle();
  27. 27. Graphdatabase101 (neo4j)➡ findfriendsofmyfriends...TraversalDescription td =Traversal.description()          .breadthFirst()          .relationships(RelTypes.KNOWS, Direction.OUTGOING)          .evaluator(Evaluators.toDepth(2));Traverser traverser = td.traverse(davy);for (Path path : traverser) { ... }
  28. 28. Graphdatabase101 (neo4j)➡ findfriendsofmyfriends...START davy=node:node_auto_index(name = “Davy”)MATCH davy-[:KNOWS]->()-[:KNOWS]->fofRETURN davy, fofExecutionEngine engine = new ExecutionEngine(graph);ExecutionResults result = engine.execute(query);for(Map<String,Object> row : result) { ... }
  29. 29. Usecases...?➡ recommendations➡ accesscontrol➡ routingGraphDatabase ➡ socialcomputing/networks➡ genealogy
  30. 30. insightsinbigdata➡ typicalapproachthroughwarehousing★ starschemawithfacttablesanddimensiontables
  31. 31. insightsinbigdata➡ typicalapproachthroughwarehousing★ starschemawithfacttablesanddimensiontables
  32. 32. insightsinbigdata➡ typicalapproachthroughwarehousing★ starschemawithfacttablesanddimensiontables
  33. 33. insightsinbigdata★ real-timevisualization★ filtering★ metrics★ layouting★ modular1,21.http://gephi.org/plugins/neo4j-graph-database-support/ 2.http://github.com/datablend/gephi-blueprints-plugin
  34. 34. geneexpressionclustering★ 4.800samples★ 27.000genes➡ oncologydataset:➡ Question:★ for a particular subset of samples,whichgenesareco-expressed?
  35. 35. mongodbforstoringgeneexpressions{ "_id" : { "$oid" : "4f1fb64a1695629dd9d916e3"} ,  "sample_name" : "122551hp133a21.cel" ,  "genomics_id" : 122551 ,  "sample_id" : 343981 ,  "donor_id" : 143981 ,  "sample_type" : "Tissue" ,  "sample_site" : "Ascending colon" ,  "pathology_category" : "MALIGNANT" ,  "pathology_morphology" : "Adenocarcinoma" ,  "pathology_type" : "Primary malignant neoplasm of colon" ,  "primary_site" : "Colon" ,  "expressions" : [ { "gene" : "X1_at" , "expression" : 5.54217719084415} ,                    { "gene" : "X10_at" , "expression" : 3.92335121981739} ,                    { "gene" : "X100_at" , "expression" : 7.81638155662255} ,                    { "gene" : "X1000_at" , "expression" : 5.44318512260619} ,                     … ]}
  36. 36. pearsoncorrelationthroughmap-reducepearson correlationx y43 9921 6525 7942 7557 8759 810,52
  37. 37. co-expressiongraph➡ createanodeforeachgene➡ ifcorrelationbetweentwogenes>=0.8,drawanedgebetweenbothnodes
  38. 38. co-expressiongraph
  39. 39. mutationprevalence
  40. 40. mutationprevalence
  41. 41. mutationprevalence
  42. 42. mutationprevalence
  43. 43. analyzingrunningdata<trkpt lon="4.723870977759361" lat="51.075748661533">    <ele>29.799999237060547</ele>    <time>2011-11-08T19:18:39.000Z</time></trkpt><trkpt lon="4.724105251953006" lat="51.075623352080584">    <ele>29.799999237060547</ele>    <time>2011-11-08T19:18:45.000Z</time></trkpt><trkpt lon="4.724143054336309" lat="51.07560558244586">    <ele>29.799999237060547</ele>    <time>2011-11-08T19:18:46.000Z</time></trkpt>
  44. 44. analyzingrunningdatathroughneo4j➡ usingneo4jspatialextension➡ createanodeforeachtrackedpointList<GeoPipeFlow> closests =GeoPipeline.startNearestNeighborLatLonSearch(runningLayer, to, 0.02).sort("OrthodromicDistance").getMin("OrthodromicDistance").toList();➡connectsucceedingtrackingnodesinagraph
  45. 45. analyzingrunningdata
  46. 46. analyzinggoogleanalyticsdata➡ sourceurl->targeturl
  47. 47. graphsandtime...➡ fluxgraph:ablueprints-compatiblegraphontopofDatomic➡ makeFluxGraphfullytime-aware★ travelyourgraphthroughtime★ time-scopediterationofverticesandedges★ temporalgraphcomparison➡ towardsatime-awaregraph...➡ reproduciblegraphstate
  48. 48. travelthroughtimeFluxGraph fg = new FluxGraph();
  49. 49. travelthroughtimeFluxGraph fg = new FluxGraph();Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);Davy
  50. 50. travelthroughtimeFluxGraph fg = new FluxGraph();Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);DavyKimVertex kim = ...
  51. 51. travelthroughtimeFluxGraph fg = new FluxGraph();Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);PeterDavyKimVertex kim = ...Vertex peter = ...
  52. 52. travelthroughtimeFluxGraph fg = new FluxGraph();Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);PeterDavyKimVertex kim = ...Vertex peter = ...Edge e1 =fg.addEdge(davy, kim, “knows”);knows
  53. 53. travelthroughtimePeterDavyKimknows
  54. 54. travelthroughtimeDate checkpoint = new Date();PeterDavyKimknows
  55. 55. travelthroughtimeDate checkpoint = new Date();davy.setProperty(“name”,”David”);PeterDavyKimknows
  56. 56. travelthroughtimeDate checkpoint = new Date();davy.setProperty(“name”,”David”);PeterKimknowsDavid
  57. 57. travelthroughtimeDate checkpoint = new Date();davy.setProperty(“name”,”David”);PeterKimEdge e2 =fg.addEdge(davy, peter, “knows”);knowsDavidknows
  58. 58. travelthroughtimePeterDavyKimDavidDavyKimknowsknowsPeterknowscheckpointcurrenttimeby default
  59. 59. travelthroughtimePeterDavyKimDavidDavyKimknowsknowsPeterknowscheckpointcurrenttimefg.setCheckpointTime(checkpoint);
  60. 60. travelthroughtimePeterDavyKimDavidDavyKimknowsknowsPeterknowscheckpointcurrenttimefg.setCheckpointTime(checkpoint);
  61. 61. tcurrrentt3t2time-scopediterationchange change changeDavy’’’Davy’ Davy’’t1Davy➡howtofindtheversionofthevertexyouareinterestedin?
  62. 62. tcurrrentt3t2time-scopediterationDavy’’’Davy’ Davy’’t1Davy
  63. 63. next next nextpreviouspreviousprevioustcurrrentt3t2time-scopediterationDavy’’’Davy’ Davy’’t1Davy
  64. 64. next next nextpreviouspreviousprevioustcurrrentt3t2time-scopediterationDavy’’’Davy’ Davy’’t1DavyVertex previousDavy = davy.getPreviousVersion();
  65. 65. next next nextpreviouspreviousprevioustcurrrentt3t2time-scopediterationDavy’’’Davy’ Davy’’t1DavyVertex previousDavy = davy.getPreviousVersion();Iterable<Vertex> allDavy = davy.getNextVersions();
  66. 66. next next nextpreviouspreviousprevioustcurrrentt3t2time-scopediterationDavy’’’Davy’ Davy’’t1DavyVertex previousDavy = davy.getPreviousVersion();Iterable<Vertex> allDavy = davy.getNextVersions();Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);
  67. 67. next next nextpreviouspreviousprevioustcurrrentt3t2time-scopediterationDavy’’’Davy’ Davy’’t1DavyVertex previousDavy = davy.getPreviousVersion();Iterable<Vertex> allDavy = davy.getNextVersions();Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);Interval valid = davy.getTimerInterval();
  68. 68. PeterPeterDavyKimDavid DavyKimtemporalgraphcomparisonknowsknowsknowscurrent checkpointwhatchanged?
  69. 69. temporalgraphcomparison➡difference(A,B) = union(A,B)-B➡...asa(immutable)graph!difference ( , ) =Davidknows
  70. 70. t3t2t1usecase:longitudinalpatientdatapatient patientsmokingpatientsmokingt4patientcancert5patientcancerdeath
  71. 71. usecase:longitudinalpatientdata➡ historicaldatafor15.000patientsoveraperiodof10years(2001-2010)➡ exampleanalysis:★ ifamalepatientisnolongersmokingin2005★ whatarethechancesofgettinglungcancerin2010,comparingpatientsthatsmokedbefore2005patientsthatneversmoked
  72. 72. FluxGraphhttp://github.com/datablend/fluxgraph➡availableongithub
  73. 73. OpenInnovationNetworkingTool➡ Manydifferentprojects,manydifferentpartners,manydifferentdomains...★ howdowekeeptrack?★ howcanwelearnfromthedata?➡ Storethedateinit’smostnaturalform,agraph➡usegraphalgorithmstoidentifytheimportanceofeachnodeandtheirrelatedones
  74. 74. OpenInnovationNetworkingTool
  75. 75. OpenInnovationNetworkingTool
  76. 76. Moregraphs...➡ pharma➡ geospatial➡ dependencyanalysis➡ ontology➡ ...
  77. 77. Questions?
  78. 78. E-MAILinfo@datablend.beFollowustwitter.com/data_blendwww.datablend.bewww.datablend.be info@datablend.be 0499/05.00.89datablend-continuum

×