(ATS6-PLAT03) What's behind Discngine collections

1,297 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,297
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

(ATS6-PLAT03) What's behind Discngine collections

  1. 1. ATS6-PLAT03Whats behind DiscnginecollectionsAccelrys Tech Summit 2013Eric Le Roux – Vincent Le Guilloux | May, 2013
  2. 2. AgendaDiscngineTibco Spotfire Connector► How it works► Integration challengesGraph collection► Quick introduction to graphs► Implementations approach (In-memory and graphdatabases)► Quick demo / Use case
  3. 3. DiscngineScientific computing consulting services and solutionsfor pharmaceutical research3Customers: Sanofi, l’Oréal, IPSEN, Novartis, Roche, Pierre Fabre, CEREP,P&G, Servier, Cephalon, Tibotec-Virco, Galapagos, Biofocus…Founded in 2004 - Based in Paris, France - 17 ConsultantsCome visit ourbooth for moreinformation &demos
  4. 4. Tibco Spotfire Pipeline Pilot Connector4How does it work?
  5. 5. Tibco Spotfire Pipeline Pilot ConnectorDemo5
  6. 6. Tibco Spotfire Pipeline Pilot Connector6Pipeline Pilot Server Tibco Spotfire ServerDiscngine TS ConnectorCollectionDiscngineWebPanelClient ManagementTemplate storageArchitecture
  7. 7. Tibco Spotfire Pipeline Pilot Connector7Pipeline Pilot Server Tibco Spotfire ServerDiscngine TS ConnectorCollectionDiscngineWebPanelClient ManagementTemplate storageArchitectureJavascript – C#wrapper
  8. 8. Tibco Spotfire Pipeline Pilot Connector8Pipeline Pilot Server Tibco Spotfire ServerDiscngine TS ConnectorCollectionDiscngineWebPanelClient ManagementTemplate storageArchitectureReportingcollection basedcustomcomponents
  9. 9. Tibco Spotfire Pipeline Pilot Connector9Pipeline Pilot Server Tibco Spotfire ServerDiscngine TS ConnectorCollectionDiscngineWebPanelClient ManagementTemplate storageOracle ApplicationExpressOther web serverArchitecture
  10. 10. Tibco Spotfire Pipeline Pilot ConnectorExecution flow (basic protocol)1. Pipeline Pilot protocol runs2. Pipeline Pilot protocol generate a HTML page3. The HTML page is rendered in an InternetExplorer .net control inside Discngine Web Panel4. JavaScript instruction is executed5. Spotfire C# API function is called6. End of HTML page rendering10
  11. 11. Tibco Spotfire Pipeline Pilot ConnectorDemo: Building protocols11
  12. 12. Tibco Spotfire Pipeline Pilot ConnectorIntegration challenges12
  13. 13. Tibco Spotfire Pipeline Pilot ConnectorIntegration challenges► API encapsulation139000+ Methods &Properties28 components
  14. 14. Tibco Spotfire Pipeline Pilot ConnectorIntegration challenges► API encapsulationExample: « Event Listener », a single component to• Listen to marking events• Create a hidden form• Capture marked records identifiers• Submit marked records to a PP protocol14123
  15. 15. Tibco Spotfire Pipeline Pilot ConnectorIntegration challenges► Component parameters mapping & wording15Do you speak Pipelinish?X
  16. 16. Tibco Spotfire Pipeline Pilot ConnectorIntegration challenges► Component parameters mapping & wording16No I speak Spotfirish!
  17. 17. Tibco Spotfire Pipeline Pilot ConnectorIntegration challenges► Component parameters mapping & wording17How to capture advanced color gradientswith component parameters? Workaround: Spotfire templates
  18. 18. Tibco Spotfire Pipeline Pilot ConnectorIntegration challenges► Client & server datasets synchronization18Data consistencyEnd-users can modify data context on the client side:Computation of new columnsAdd & remove rowsDrop & create data tablesInitialize data sets on the client (new .dxp file)
  19. 19. Tibco Spotfire Pipeline Pilot ConnectorIntegration challenges► Client & server datasets synchronization• Option 1: HTTP Post19Pipeline Pilot ServerDiscngineWebPanelImplementedin v1.2Data tables in.stdf format
  20. 20. Tibco Spotfire Pipeline Pilot ConnectorIntegration challenges► Client & server datasets synchronization• Option 2: File copy20Pipeline Pilot ServerDiscngineWebPanelImplementedin v1.2Data tables in .stdf format.stdf reader componentFilesystem
  21. 21. Web Mashups21Come visit ourbooth for moreinformation &demos
  22. 22. The Graph collection22
  23. 23. Agenda(quick) Introduction to GraphsGraphs in Pipeline PilotDemoGraph Databases in Pipeline PilotDemo23
  24. 24. What is a Graph ?24A graph is a data structure representing objects (nodes)that are connected to each others by links (edges, orrelationships).
  25. 25. What is a Graph ?25A graph is a data structure representing objects (nodes)that are connected to each others by links (edges, orrelationships).NodeUndirected EdgeDirected Edge
  26. 26. Property Graph Data Model26
  27. 27. Property Graph Data Model27Protein A Protein B Molecule 1 Molecule 2
  28. 28. Property Graph Data Model28Protein A Protein BinteractMolecule 1 Molecule 2similarinhibits shareFragment
  29. 29. Property Graph Data Model29Protein A Protein BinteractMolecule 1 Molecule 2similarinhibitsLogP = 1.1pIC50 = 6.8shareFragment
  30. 30. Graphs: when and why?Graph Problems► You need Graphs if you have a problem thatrequires algorithms related to graph theory:• Shortest path (GPS systems)• Motif search (substructure search in molecules)• Importance Measures (Google’s PageRank)30
  31. 31. Graphs: when and why?Visualization► You may want to use graphs as an intuitive way torepresent objects and their relationships• Subway Map• Metabolic Pathways• Protein-protein interaction networks• Molecule depiction31
  32. 32. Graphs: when and why?Data Modeling (NoSQL / Big Data hype)► You can use graphs as a flexible data model, whenyour data consists in objects and relationshipsbetween them• Google’s Knowledge Graph• Facebook Graph Search32
  33. 33. Discngine Graph CollectionManage graphs as Pipeline Pilot data records:► Creation and Manipulation► Algorithms► Persistence / IO► Visualization► Traversals (the “SQL” of graphs)33
  34. 34. The big questionHow can we represent graphs in the data flow ?► A Graph is not flat► A Graph has different types of data► Advanced data structures are required to operateefficiently on graphs34
  35. 35. The big questionHow can we represent graphs in the data flow ?35Pro ConsNative No objects,methods, etc.User anddeveloperfriendlyNo Fibonnacyheap, FIFO /LIFO queues,etc.Record hierarchyis a TreePipeline Pilot Data model
  36. 36. The big questionHow can we represent graphs in the data flow ?36Pro ConsAdvancedprogrammingframeworkPerformance:overheadinduced byinterfacing C++and JAVA / PerlExposes mostfunctionsrequired to dealwith data recordPipeline Pilot Data model JAVA / Perl APIPro ConsNative No objects,methods, etc.User anddeveloperfriendlyNo Fibonnacyheap, FIFO /LIFO queues,etc.Record hierarchyis a Tree
  37. 37. The answerHow can we represent graphs in the data flow ?A mixed solution:► JAVA for performance and advanced datastructures / Object Oriented API► Expose part of the data and processes via. thedata record tree and PilotScript37
  38. 38. PilotGraph Hierarchy38
  39. 39. PilotGraph Hierarchy39
  40. 40. PilotGraph Hierarchy40Root Node of a data record
  41. 41. PilotGraph Hierarchy41Group Node containing Node records
  42. 42. PilotGraph Hierarchy42Group Node containing edge records
  43. 43. PilotGraph Hierarchy43Nodes containing properties
  44. 44. PilotGraph in JAVA44
  45. 45. PilotGraph in JAVA45
  46. 46. PilotGraph in JAVA46https://github.com/tinkerpop/blueprints/wiki
  47. 47. Demo47
  48. 48. PilotGraph Model: cons48JAVA consumes memoryJAVA has limited allocated memory per-job► 384 Mb on a 64 bit server – seeapps/scitegic/core/xml/Objects/JavaEnvironment.xmlSerialization is OK for small tomedium graphs, but the biggerthe graph is, the longer theserialization process will be
  49. 49. Graph Databases49Graph Databases are persistent engines dedicatedto the storage of graph data structures.The Graph Database Stack (not exhaustive):► Neo4j► Orient DB► HypergraphDB► Titan► Dex► InfiniteGraph► AllegroGraph
  50. 50. PilotGraph VS DatabaseGraph50PilotGraph (record)~ 300 000 elements(depends on the amount ofmemory allocated to JAVA)
  51. 51. PilotGraph VS DatabaseGraph51PilotGraph (record) DatabaseGraph (connection)~ 300 000 elements(depends on the amount ofmemory allocated to JAVA)Millions to Billions of elements
  52. 52. Graph database workflow52
  53. 53. Demos53
  54. 54. Take home messageWhat is the best way to manage Graphs withinPipeline Pilot ?► Take advantage of PP JAVA API, which is the besttradeoff between performance and flexibility JAVA► Expose as much as possible the data via DataRecord hierarchy and Pilotscript► Use a common API to manage in-memory andpersistent graph databases transparently54
  55. 55. Thank you for your attentionTraversals, Visualization, Reporting Integration,Algorithms, Roadmap…Welcome to our booth 55Come visit ourbooth for moreinformation &demos
  56. 56. www.discngine.comThanks!
  57. 57. Graph Collection v 2.057BASIC MANIPULATIONS► Add / Remove elements• From Cache• From Records► Pilotscript facilities• Remove elements with Pilotscript• Set property values► Add / Remove / Keep Properties► Join Graph Records► Intersect Graph Records► Extract Edges and Nodes► Key-Value property search► Traversal frameworkGRAPH ALGORITHMS► Shortest Path (weighted / unweighted► Minimum Spanning Tree► Cliques► Disconnected sub-graphs► Articulators► Subgraph-matchingIMPORTANCE MEASURES► Degree centrality► Closeness centrality► Density► Distance to query
  58. 58. Graph Collection v 2.058VISUALISATION► Layouts• ARF• Frucherman-Reingold• GraphViz► GraphViz Integration► HTML 5 Interactive Viewer► Cytoscape Web ReportREPORTING INTEGRATION► GraphViz image report► HTML 5 Graph report (prototype)► Cytoscape Web Report (prototype)READERS AND WRITERS► GraphML► SIF (Cytoscape)► GEXFGRAPH DATABASE► Neo4j Integration► ACID transactions► Algorithms can be applied on graphdatabases in a transparent way► Scales to millions of nodes and edges
  59. 59. Traversal ?59“I have an active molecule on protein P, which other protein(s) can bepotentially inhibited by this molecule ?“Step 0: Find your query in the graphQuery
  60. 60. Traversal ?60“I have an active molecule on protein P, which other protein(s) can bepotentially inhibited by this molecule ?“Step 1: Fetch similar molecules : Walk through “similar”relationshipsQuerysimilar
  61. 61. Traversal ?61“I have an active molecule on protein P, which other protein(s) can bepotentially inhibited by this molecule ?“Step 1: Fetch similar molecules : Save moleculesMolQuerysimilarMol
  62. 62. Traversal ?62“I have an active molecule on protein P, which other protein(s) can bepotentially inhibited by this molecule ?“Step 2: Fetch associated proteins: walk through “activates” and“inhibits” (and anything else related to our problem) relationshipsinhibitspIC50 = 8,8MolQuerysimilarMol
  63. 63. Traversal ?63“I have an active molecule on protein P, which other protein(s) can bepotentially inhibited by this molecule ?“Step 3: Collect the (potential!) winnersProtein BProtein CinhibitspIC50 = 8,8MolQuerysimilarMol
  64. 64. Protein-Protein interaction networksProteins linked if they interact64
  65. 65. Protein-Protein interaction networksHubs: highly connected proteins65
  66. 66. Protein-Protein interaction networksArticulators: central proteins that, if removed (i.e.inhibited), will disconnect two functional modules66
  67. 67. Protein-Protein interaction networksArticulators: central proteins that, if removed (i.e.inhibited), will disconnect two functional modulesCandidates for inhibition ? Side Effects ?67
  68. 68. SAR AnalysisSimilarity networks68similarTanimoto = 0,98
  69. 69. SAR AnalysisSimilarity networks (PubchemCYP3A4 inhibition assay, AID884)69Cluster oflow activityCluster of highactivity
  70. 70. SAR AnalysisActivity cliffs70pIC50 = 5.1 pIC50 = 6,9
  71. 71. SAR AnalysisSingle-point substitution analysis71
  72. 72. Scaffold Network Display72

×