2. Agenda
Discngine
Tibco Spotfire Connector
► How it works
► Integration challenges
Graph collection
► Quick introduction to graphs
► Implementations approach (In-memory and graph
databases)
► Quick demo / Use case
3. Discngine
Scientific computing consulting services and solutions
for pharmaceutical research
3
Customers: Sanofi, l’Oréal, IPSEN, Novartis, Roche, Pierre Fabre, CEREP,
P&G, Servier, Cephalon, Tibotec-Virco, Galapagos, Biofocus…
Founded in 2004 - Based in Paris, France - 17 Consultants
Come visit our
booth for more
information &
demos
6. Tibco Spotfire Pipeline Pilot Connector
6
Pipeline Pilot Server Tibco Spotfire Server
Discngine TS Connector
Collection
Discngine
Web
Panel
Client Management
Template storage
Architecture
7. Tibco Spotfire Pipeline Pilot Connector
7
Pipeline Pilot Server Tibco Spotfire Server
Discngine TS Connector
Collection
Discngine
Web
Panel
Client Management
Template storage
Architecture
Javascript – C#
wrapper
8. Tibco Spotfire Pipeline Pilot Connector
8
Pipeline Pilot Server Tibco Spotfire Server
Discngine TS Connector
Collection
Discngine
Web
Panel
Client Management
Template storage
Architecture
Reporting
collection based
custom
components
9. Tibco Spotfire Pipeline Pilot Connector
9
Pipeline Pilot Server Tibco Spotfire Server
Discngine TS Connector
Collection
Discngine
Web
Panel
Client Management
Template storage
Oracle Application
Express
Other web server
Architecture
10. Tibco Spotfire Pipeline Pilot Connector
Execution flow (basic protocol)
1. Pipeline Pilot protocol runs
2. Pipeline Pilot protocol generate a HTML page
3. The HTML page is rendered in an Internet
Explorer .net control inside Discngine Web Panel
4. JavaScript instruction is executed
5. Spotfire C# API function is called
6. End of HTML page rendering
10
13. Tibco Spotfire Pipeline Pilot Connector
Integration challenges
► API encapsulation
13
9000+ Methods &
Properties
28 components
14. Tibco Spotfire Pipeline Pilot Connector
Integration challenges
► API encapsulation
Example: « Event Listener », a single component to
• Listen to marking events
• Create a hidden form
• Capture marked records identifiers
• Submit marked records to a PP protocol
14
1
2
3
15. Tibco Spotfire Pipeline Pilot Connector
Integration challenges
► Component parameters mapping & wording
15
Do you speak Pipelinish?
X
16. Tibco Spotfire Pipeline Pilot Connector
Integration challenges
► Component parameters mapping & wording
16
No I speak Spotfirish!
17. Tibco Spotfire Pipeline Pilot Connector
Integration challenges
► Component parameters mapping & wording
17
How to capture advanced color gradients
with component parameters?
Workaround: Spotfire templates
18. Tibco Spotfire Pipeline Pilot Connector
Integration challenges
► Client & server datasets synchronization
18
Data consistency
End-users can modify data context on the client side:
Computation of new columns
Add & remove rows
Drop & create data tables
Initialize data sets on the client (new .dxp file)
19. Tibco Spotfire Pipeline Pilot Connector
Integration challenges
► Client & server datasets synchronization
• Option 1: HTTP Post
19
Pipeline Pilot Server
Discngine
Web
Panel
Implemented
in v1.2
Data tables in
.stdf format
20. Tibco Spotfire Pipeline Pilot Connector
Integration challenges
► Client & server datasets synchronization
• Option 2: File copy
20
Pipeline Pilot Server
Discngine
Web
Panel
Implemented
in v1.2
Data tables in .stdf format
.stdf reader component
Filesystem
24. What is a Graph ?
24
A graph is a data structure representing objects (nodes)
that are connected to each others by links (edges, or
relationships).
25. What is a Graph ?
25
A graph is a data structure representing objects (nodes)
that are connected to each others by links (edges, or
relationships).
Node
Undirected Edge
Directed Edge
28. Property Graph Data Model
28
Protein A Protein B
interact
Molecule 1 Molecule 2
similar
inhibits shareFragment
29. Property Graph Data Model
29
Protein A Protein B
interact
Molecule 1 Molecule 2
similar
inhibits
LogP = 1.1
pIC50 = 6.8
shareFragment
30. Graphs: when and why?
Graph Problems
► You need Graphs if you have a problem that
requires algorithms related to graph theory:
• Shortest path (GPS systems)
• Motif search (substructure search in molecules)
• Importance Measures (Google’s PageRank)
30
31. Graphs: when and why?
Visualization
► You may want to use graphs as an intuitive way to
represent objects and their relationships
• Subway Map
• Metabolic Pathways
• Protein-protein interaction networks
• Molecule depiction
31
32. Graphs: when and why?
Data Modeling (NoSQL / Big Data hype)
► You can use graphs as a flexible data model, when
your data consists in objects and relationships
between them
• Google’s Knowledge Graph
• Facebook Graph Search
32
33. Discngine Graph Collection
Manage graphs as Pipeline Pilot data records:
► Creation and Manipulation
► Algorithms
► Persistence / IO
► Visualization
► Traversals (the “SQL” of graphs)
33
34. The big question
How can we represent graphs in the data flow ?
► A Graph is not flat
► A Graph has different types of data
► Advanced data structures are required to operate
efficiently on graphs
34
35. The big question
How can we represent graphs in the data flow ?
35
Pro Cons
Native No objects,
methods, etc.
User and
developer
friendly
No Fibonnacy
heap, FIFO /
LIFO queues,
etc.
Record hierarchy
is a Tree
Pipeline Pilot Data model
36. The big question
How can we represent graphs in the data flow ?
36
Pro Cons
Advanced
programming
framework
Performance:
overhead
induced by
interfacing C++
and JAVA / Perl
Exposes most
functions
required to deal
with data record
Pipeline Pilot Data model JAVA / Perl API
Pro Cons
Native No objects,
methods, etc.
User and
developer
friendly
No Fibonnacy
heap, FIFO /
LIFO queues,
etc.
Record hierarchy
is a Tree
37. The answer
How can we represent graphs in the data flow ?
A mixed solution:
► JAVA for performance and advanced data
structures / Object Oriented API
► Expose part of the data and processes via. the
data record tree and PilotScript
37
48. PilotGraph Model: cons
48
JAVA consumes memory
JAVA has limited allocated memory per-job
► 384 Mb on a 64 bit server – see
apps/scitegic/core/xml/Objects/JavaEnvironment.xml
Serialization is OK for small to
medium graphs, but the bigger
the graph is, the longer the
serialization process will be
49. Graph Databases
49
Graph Databases are persistent engines dedicated
to the storage of graph data structures.
The Graph Database Stack (not exhaustive):
► Neo4j
► Orient DB
► HypergraphDB
► Titan
► Dex
► InfiniteGraph
► AllegroGraph
51. PilotGraph VS DatabaseGraph
51
PilotGraph (record) DatabaseGraph (connection)
~ 300 000 elements
(depends on the amount of
memory allocated to JAVA)
Millions to Billions of elements
54. Take home message
What is the best way to manage Graphs within
Pipeline Pilot ?
► Take advantage of PP JAVA API, which is the best
tradeoff between performance and flexibility JAVA
► Expose as much as possible the data via Data
Record hierarchy and Pilotscript
► Use a common API to manage in-memory and
persistent graph databases transparently
54
55. Thank you for your attention
Traversals, Visualization, Reporting Integration,
Algorithms, Roadmap…
Welcome to our booth
55
Come visit our
booth for more
information &
demos
57. Graph Collection v 2.0
57
BASIC MANIPULATIONS
► Add / Remove elements
• From Cache
• From Records
► Pilotscript facilities
• Remove elements with Pilotscript
• Set property values
► Add / Remove / Keep Properties
► Join Graph Records
► Intersect Graph Records
► Extract Edges and Nodes
► Key-Value property search
► Traversal framework
GRAPH ALGORITHMS
► Shortest Path (weighted / unweighted
► Minimum Spanning Tree
► Cliques
► Disconnected sub-graphs
► Articulators
► Subgraph-matching
IMPORTANCE MEASURES
► Degree centrality
► Closeness centrality
► Density
► Distance to query
58. Graph Collection v 2.0
58
VISUALISATION
► Layouts
• ARF
• Frucherman-Reingold
• GraphViz
► GraphViz Integration
► HTML 5 Interactive Viewer
► Cytoscape Web Report
REPORTING INTEGRATION
► GraphViz image report
► HTML 5 Graph report (prototype)
► Cytoscape Web Report (prototype)
READERS AND WRITERS
► GraphML
► SIF (Cytoscape)
► GEXF
GRAPH DATABASE
► Neo4j Integration
► ACID transactions
► Algorithms can be applied on graph
databases in a transparent way
► Scales to millions of nodes and edges
59. Traversal ?
59
“I have an active molecule on protein P, which other protein(s) can be
potentially inhibited by this molecule ?“
Step 0: Find your query in the graph
Query
60. Traversal ?
60
“I have an active molecule on protein P, which other protein(s) can be
potentially inhibited by this molecule ?“
Step 1: Fetch similar molecules : Walk through “similar”
relationships
Query
similar
61. Traversal ?
61
“I have an active molecule on protein P, which other protein(s) can be
potentially inhibited by this molecule ?“
Step 1: Fetch similar molecules : Save molecules
Mol
Query
similar
Mol
62. Traversal ?
62
“I have an active molecule on protein P, which other protein(s) can be
potentially inhibited by this molecule ?“
Step 2: Fetch associated proteins: walk through “activates” and
“inhibits” (and anything else related to our problem) relationships
inhibits
pIC50 = 8,8
Mol
Query
similar
Mol
63. Traversal ?
63
“I have an active molecule on protein P, which other protein(s) can be
potentially inhibited by this molecule ?“
Step 3: Collect the (potential!) winners
Protein B
Protein C
inhibits
pIC50 = 8,8
Mol
Query
similar
Mol