1. Realize The Value In Your Big Data With Graph Technologywww.Objectivity.com Leon Guzenda - Objectivity, Inc. DBTA Webinar – January 17, 2013
2. Overview• Who We Are• Current Big Data Analytics• Relationship Analytics• Graph Technologies• The Big Data Connection Platform
3. About Objectivity Inc. • Objectivity, Inc. is headquartered in Sunnyvale, California. • Established in 1988 to tackle database problems that network/hierarchical/relational and file-based technologies struggle with. • Objectivity has over two decades of Big Data and NoSQL experience • Develops NoSQL platforms for managing and discovering relationships and patterns in complex data: – Objectivity/DB - an object database that manages localized, centralized or distributed databases – InfiniteGraph - a massively scalable graph database built on Objectivity/DB that enables organizations to find, store and exploit the relationships in their data Embedded in hundreds of enterprises, government organizations and products - millions of deployments.
4. 492 Human Intelligence (HUMINT) Analysis811 9/28/11 4
5. Big Data Technologies Are Still Evolving
6. We All Know The Problem - Information Overload! Volume, Velocity, Variety, Veracity, Value... Making sense of it all takes time and $$$ Current “Big Data” Analytics
7. A Typical “Big Data” Analytics Setup Data Aggregation and Analytics Applications Commodity Linux Platforms and/or High Performance Computing Clusters Column Data Graph Object K-V RDBMS Hadoop Doc DB Store W/H DB DB Store Structured Semi-Structured Unstructured
8. Incremental Improvements Aren’t EnoughAll current solutions use the same basic architectural model• None of the current solutions have a way to store connections between entities in different silos• Most analytic technology focuses on the content of the data nodes, rather than the many kinds of connections between the nodes and the data in those connections• Why? Because traditional and earlier NoSQL solutions are bad at handling relationships.• Graph databases can efficiently store, manage and query the many kinds of relationships hidden in the data.
9. Not Only SQL – a group of 4 primary technologies• Key-Value Stores• “Big Table” Clones• Document Databases• Object and Graph databases Graph Database Graph Processing
10. Not Only SQL – A group of 4 primary technologies Highly Simple Interconnected
11. Graph Theory Terminology... VERTEX: A single node in a graph data structure EDGE: A connection between a pair of VERTICES PROPERTIES: Data items that belong to a particular Vertex WEIGHT: A quantity associated with a particular Edge GRAPH: A collection of linked Vertex and Edge objects Vertex 1 Edge 1 Vertex 2 City: San Francisco Road: I-101 City: San Jose Pop: 812,826 Miles: 47.8 Pop: 967,487
12. ...Graph Theory Terminology... SIMPLE/UNDIRECTED GRAPH: A Graph where each VERTEX may be linked to one or more Vertex objects via Edge objects and each Edge object is connected to exactly two Vertex objects. Furthermore, neither Vertex connected to an Edge is more significant than the other. DIRECTED GRAPH: A Simple/Undirected Graph where one Vertex in a Vertex + Edge + Vertex group (an “Arc” or “Path”) can be considered the “Head” of the Path and the other can be considered the “Tail”. MIXED GRAPH: A Graph in which some paths are Undirected and others are Directed.
13. ...Graph Theory Terminology LOOP: An Edge that is doubly-linked to the same Vertex MULTIGRAPH: A Graph that allows multiple Edges and Loops QUIVER: A Graph where Vertices are allowed to be connected by multiple Arcs. A Quiver may include Loops. WEIGHTED GRAPH: A Graph where a quantity is assigned to an Edge, e.g. a Length assigned to an Edge representing a road between two Vertices representing cities. HALF EDGE: An Edge that is only connected to a single Vertex LOOSE EDGE: An Edge that isnt connected to any Vertices. CONNECTIVITY: Two Vertices are Connected if it is possible to find a path between them.
14. Relationship Analytics
15. Example 1 – Social Network Analysis Sources may be covert or open Telecom Call Detail Records Banking transactions Flight and hotel reservations MASINT Twitter Facebook Google+ LinkedIn Plaxo Flickr Youtube
16. Example 2 – Finding Patterns In Open Source Data...The Challenges Data Volumes Fast-Changing Data Sensitivity of Data Significance of Data
17. ...Example 2 – Finding Patterns In Open Source Data
18. Example 3 – Logistics
19. Example 4 - Cyber Security...
20. … Example 4 - Cyber Security
21. Link Hunter - POC For A Federal Police Force Run the live demo at objectivity.com [Resources, Live Demos]
22. MAKING GRAPH ANALYTICS WORK EFFICIENTLY
23. Relationship (Connection) Analytics...A SQL ShortcomingThink about the SQL query for finding all links between the two “blue” rows... its hard!! Table_A Table_B Table_C Table_D Table_E Table_F Table_G There are some kinds of complex relationship handling problems that SQL wasnt designed for.
24. Relationship (Connection) Analytics...A SQL Shortcoming Table_A Table_B Table_C Table_D Table_E Table_F Table_GInfiniteGraph - The solution can be found with a few lines of code A3 G4
25. Representing the Graph...The existing data might look like this:Events/Places People/Orgs Facts Situation X Combatant A A Called P A Seen Near X P Emailed SSituation Y Bank X P Called Q Q Seen Near T X Paid S Target T Civilian P R Seen Near T P Called R Cafe C Civilian Q A Banks at X S Seen Near T Civilian R A Seen At Y A Eats At Civilian S
26. Representing the Graph...We start by identifying the nodes (Vertices) and the connections (Edges) NODES CONNECTIONSEvents/Places People/Orgs Facts Situation X Combatant A A Called P A Seen Near X P Emailed SSituation Y Bank X P Called Q Q Seen Near T X Paid S Target T Civilian P R Seen Near T P Called R Cafe C Civilian Q A Banks at X S Seen Near T Civilian R A Seen At Y A Eats At Civilian S
27. ...Representing the Graph.. 2 N “Nodes” VERTEX EDGE “Connections”
28. ...Representing the Graph.. “Nodes” VERTEX EDGE “Connections”Situation X Seen Near Combatant A Seen At Situation Y Eats At Called Banks At Cafe C Civilian P Bank X Called Called Emailed Paid Civilian Q Civilian R Civilian S Seen Near Seen Near Seen Near Target T
29. ...Analyzing the Graph...Situation X Seen Near Combatant A Seen At Situation Y Called Banks At Eats At Cafe C Civilian P Bank X Called Called Emailed Paid Civilian Q Civilian R Civilian S Seen Near Seen Near Seen Near Target T
30. ...Threat AnalysisSituation X Seen Near Combatant A Seen At Situation Y Called Banks AtSUSPECTS Civilian P Bank X Called Called Emailed Paid Civilian Q Civilian R Civilian S Seen Near Seen Near Seen Near Target T NEEDS PROTECTION
34. Graph Database Technologies• In Memory, e.g. YarcData, Apache Hama...• RDF stores – Allegrograph, BigData, OpenLink Virtuoso, R2DF...• Document relationships – ArangoDB, OrientDB...• Single server or embedded graph DBMSs – DEX, Filament, Graphbase, HypergraphDB, Neo4J, VertexDB...• Layers over existing DBMSs – Horton, Infogrid, OQGraph...• Distributed Graph DBMSs – InfiniteGraph, Titan...
35. Graph Databases Post-2003 X
36. Graph Databases Compared [UNSW] SUPPORT FOR ESSENTIAL GRAPH QUERIES
37. THE BIG DATA CONNECTION PLATFORM
38. Conventional & Graph AnalyticsData Visualization & Analytics *Now HP *Now IBM Big Data ORACLE or Connection Platform Other Big Data Solutions +
39. InfiniteGraph - The Enterprise Graph Database• A high performance distributed database engine that supports analyst-time decision support and actionable intelligence• Cost effective link analysis – flexible deployment on commodity resources (hardware and OS).• Efficient, scalable, risk averse technology – enterprise proven.• High Speed parallel ingest to load graph data quickly.• Parallel, distributed queries• Flexible plugin architecture• Complementary technology• Fast proof of concept – easy to use Graph API.
40. Basic Capabilities Of Most Graph Databases Rapid Graph Traversal Inclusive or Exclusive Selection XStart Start X Find the Shortest or All Paths Between Objects Start Finish
42. Summary - Graph Analytics• Can Be Used For: – Social Network Analysis – Pattern finding in open source data – Logistics – Campaign planning – Energy usage, planning and protection• The technology works best if the graph is extracted from existing sources and stored in a Graph Database.
43. Thank You! Please take a look at objectivity.comFor InfiniteGraph Online Demos, White Papers, Free Downloads, Samples & Tutorials email@example.com