Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

1,543 views

Published on

Presented at DataWeek SF Oct 13

Most analytics depend on data-mining and statistical correlation of information held in single data stores. It is generally inefficient to replicate diverse data, which may be stored in enterprise databases or NoSQL "Big Data" repositories and consolidate them using a single database technology. Although federated queries can help with statistical correlation of data values across data stores the technique is not very good at handling the data stored in relationships because the data stores generally have no knowledge of one another. The speaker describes a different approach that uses graph (relationship) analytics to extract structural data from existing repositories, store representations of the nodes and connections in a graph database, then analyze them to extract additional value.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,543
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
29
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

  1. 1. Copyright © Objectivity, Inc. 2013 Using A Distributed Graph Database To Make Sense Of Disparate Data Stores Leon Guzenda Dataweek San Francisco – October 2, 2013  Current Big Data Analytics  Graph Analytics  InfiniteGraph  The ETL & Discovery Process
  2. 2. Copyright © Objectivity, Inc. 2013 Objectivity Inc. • Objectivity, Inc. is headquartered in Sunnyvale, CA. • Objectivity has over two decades of Big Data and NoSQL experience • We develop NoSQL platforms for managing and discovering relationships and patterns in complex data: –Objectivity/DB - an object database that manages localized, centralized or distributed databases –InfiniteGraph - a massively scalable graph database built on Objectivity/DB that enables organizations to find, store and exploit the relationships in their data  Millions of deployments - Our technology is embedded in hundreds of enterprise and government systems and commercial products
  3. 3. Copyright © Objectivity, Inc. 2013 A Typical Objectivity Deployment - Sensor Data Fusion Network Centric Collaborative Targeting
  4. 4. Copyright © Objectivity, Inc. 2013 A Typical InfiniteGraph Deployment - GraphMyLife
  5. 5. Copyright © Objectivity, Inc. 2013 A Typical “Big Data” Analytics Setup Data Aggregation and Analytics Applications Commodity Linux Platforms and/or High Performance Computing Clusters Structured Semi-Structured Unstructured Graph DB Object DB Doc DB K-V StoreHadoop Column Store Data W/HRDBMS
  6. 6. Copyright © Objectivity, Inc. 2013 Incremental Analytics Improvements Aren’t Enough All current solutions use the same basic architectural model • None of the popular solutions have an efficient way to store connections between entities in different silos • Most analytic technology focuses on the content of the data nodes, rather than the many kinds of connections between the nodes and the data in those connections • Why? Because traditional and earlier NoSQL solutions are bad at handling relationships. • Graph databases can efficiently store, manage and query the many kinds of relationships hidden in the data.
  7. 7. Copyright © Objectivity, Inc. 2013 Graph Analytics
  8. 8. Copyright © Objectivity, Inc. 2013 Graph (Relationship) Analytics... A SQL Shortcoming Think about the SQL query for finding all links between the two “blue” rows... it's hard!! Table_A Table_B Table_C Table_D Table_E Table_F Table_G There are some kinds of complex relationship handling problems that SQL wasn't designed for.
  9. 9. Copyright © Objectivity, Inc. 2013 ...Graph Analytics InfiniteGraph - The solution can be found with a few lines of code A SQL Shortcoming A3 G4 Table_A Table_B Table_C Table_D Table_E Table_F Table_G
  10. 10. Copyright © Objectivity, Inc. 2013 Applications for Graph Analytics LOGISTICS HEALTHCARE INFORMATICS MARKET ANALYSIS SOCIAL NETWORK ANALYSIS
  11. 11. Representing the Graph... Combatant A Civilian Q Situation Y Civilian P Bank X Civilian S Civilian R Events/Places People/Orgs Facts Situation X The existing COMINT and HUMINT data might look like this: Target T Cafe C S Seen Near TA Banks at X A Called P A Seen At Y A Seen Near X P Emailed S P Called Q Q Seen Near T P Called R R Seen Near T X Paid S A Eats At
  12. 12. Representing the Graph... Combatant A Civilian Q Situation Y Civilian P Civilian S Civilian R Events/Places People/Orgs Facts Situation X Target T We start by identifying the nodes (Vertices) and the connections (Edges) NODES CONNECTIONS S Seen Near TA Banks at X A Called P A Seen At Y A Seen Near X P Emailed S P Called Q Q Seen Near T P Called R R Seen Near T X Paid SBank X Cafe C A Eats At
  13. 13. VERTEX EDGE 2 N ...Representing the Graph.. “Nodes” “Connections”
  14. 14. ...Representing the Graph.. Situation X Combatant ASeen Near Civilian P Called Called Seen At Situation Y Civilian Q Target T Seen Near Emailed Banks At Bank X Civilian S Seen Near Called Civilian R Seen Near Paid Eats At Cafe C VERTEX EDGE“Nodes” “Connections”
  15. 15. ...Analyzing the Graph... Situation X Combatant ASeen Near Civilian P Called Called Seen At Situation Y Civilian Q Target T Seen Near Emailed Banks At Bank X Civilian S Seen Near Called Civilian R Seen Near Paid Eats At Cafe C
  16. 16. ...Threat Analysis Situation X Combatant ASeen Near Civilian P Called Called Seen At Situation Y Civilian Q Target T Seen Near Emailed Banks At Bank X Civilian S Seen Near Called Civilian R Seen Near Paid SUSPECTS NEEDS PROTECTION
  17. 17. Copyright © Objectivity, Inc. 2013 Visual Analytics
  18. 18. Copyright © Objectivity, Inc. 2013 Graphs Can Scale Very Quickly We often hear about the “trillion row” database. Amazon S3 has reached 2 trillion, but one Objectivity site: • Processes 10s of trillions of objects per day • Supports over 1000 analysts around the clock. Consider a graph where each node has 10 connections: • At 6 degrees of freedom, finding a path between two nodes may require traversing a million links. • 9 degrees of freedom requires a billion traversals • 12 degrees of freedom requires a trillion traversals • 15 degrees of freedom requires a quadrillion traversals...
  19. 19. Copyright © Objectivity, Inc. 2013 THE ETL & DISCOVERY PROCESS
  20. 20. Copyright © Objectivity, Inc. 2013 Not Only SQL – A group of 4 primary technologies Simple Highly Interconnected
  21. 21. Copyright © Objectivity, Inc. 2013 • A high performance distributed database engine that supports analyst-time decision support and actionable intelligence • Cost effective link analysis – flexible deployment on commodity resources (hardware and OS). • Efficient, scalable, risk averse technology – enterprise proven. • High Speed parallel ingest to load graph data quickly. • Parallel, distributed queries • Flexible plugin architecture • Complementary technology • Fast proof of concept – easy to use Graph API. InfiniteGraph - The Enterprise Graph Database
  22. 22. Copyright © Objectivity, Inc. 2013 InfiniteGraph Capabilities Parallel Graph Traversal Inclusive or Exclusive Selection X X Shortest or All Paths Between Objects Start Start Start Finish Start Compute Cost To Date Visualize Computational & Visualization Plug-Ins
  23. 23. Copyright © Objectivity, Inc. 2013 A Powerful InfiniteGraph Query San Francisco Palo Alto Hillsboro Oakland Pacifica Palo Alto Cupertino San Jose Half Moon Bay Problem: Find the cheapest route for moving a 200 ton load from San Francisco to San Jose // Policies: Depth_First, Exclude Railway_Edge, Exclude_Road_Edge // Calculate: Cost_To_This_City() // Navigate: From “San Francisco” To “San Jose” // Visualizer: Map_Cheapest_Route // Visualizer: List_Cost_Breakdown. Water Rail Road Problem: Find the cheapest route for moving a 200 ton load from San Francisco to San Jose // Note: This is pseudocode, not the actual Java statements.
  24. 24. Copyright © Objectivity, Inc. 2013 Copyright © Objectivity, Inc. 2012 Recognizing Graphs In Object Models... Tree Structures Graph (Network) Structures Relationship Data Object Class A Object Class A 1-to-Many Relationship Data Object Class A Many-to-Many Object Class A
  25. 25. Copyright © Objectivity, Inc. 2013 Copyright © Objectivity, Inc. 2012 ...Recognizing Graphs In Object Models Tree Structures Graph (Network) Structures Relationship Data Object Class A Object Class A 1-to-Many Relationship Data Object Class A Many-to-Many Object Class A EDGE VERTEX GRAPH MODEL
  26. 26. Copyright © Objectivity, Inc. 2013 The ETL Process ETL Tools/Applications Commodity Linux Platforms and/or High Performance Computing Clusters Structured Semi-Structured Object DB Graph DB Unstructured Doc DB K-V StoreHadoop Column Store Data W/HRDBMS Nodes & Edges
  27. 27. Copyright © Objectivity, Inc. 2013 Commonly Used Graph Algorithms...  Connectedness  Node degree  Shortest Path  Average path length  Transitive Closure  Graph diameter (or Span)  Centrality (Betweeness, Degree and Closeness) In the graph below, node D has the highest betweeness centrality
  28. 28. Copyright © Objectivity, Inc. 2013 Data Visualization & Analytics Big Data Connection Platform *Now HP *Now IBM Conventional & Relationship Analytics ORACLE Big Data Solutions + A Typical Deployment Supplements Traditional or Big Data Systems With Graph Analytics
  29. 29. Copyright © Objectivity, Inc. 2013 Online Demo - Call Detail Record Analysis Used in law enforcement, counter-terrorism and Customer Resource Management
  30. 30. Copyright © Objectivity, Inc. 2013 Thank You! Please take a look at objectivity.com For InfiniteGraph Online Demos, White Papers, Free Downloads, Samples & Tutorials and visit our booth for a demonstration

×