• Share
  • Email
  • Embed
  • Like
  • Private Content
An Introduction to Graph Databases
 

An Introduction to Graph Databases

on

  • 843 views

This tutorial will provide you with a basic understanding of graph database technology and the ability to quickly begin development of a graph database application. You will have the capability to ...

This tutorial will provide you with a basic understanding of graph database technology and the ability to quickly begin development of a graph database application. You will have the capability to recognize graph-based problems and present the benefits of using graph technology for problem resolution.
The tutorial will give you an understanding of:
• Graph theory - origins and concepts
• Benefits of graph databases
• Different types of graph databases
• Typical graph database API
• Programming basics
• Use cases

Bring your laptops for a hands-on opportunity to practice some sample codes. A basic understanding of Java programming is a recommended prerequisite to understand this course. This session is led by the InfiniteGraph technical team and the demonstration code will be drawn from InfiniteGraph examples, however the broader educational presentation is product-neutral and not a commercial presentation of their products.

To participate in the hands-on portion of the graph tutorial users must have:
• Java programming experience
• Java Developer Kit (JDK)
• Current InfiniteGraph installed on laptop. (To download visit www.objectivity.com/infinitegraph)
• HelloGraph test – Upon installing IG, run HelloGraph to test the install. (HelloGraph can be found online at http://wiki.infinitegraph.com/2.1/w/index.php?title=Download_Sample_Code)

Leon Guzenda was one of the founding members of Objectivity in 1988 and one of the original architects of Objectivity/DB. He currently works with Objectivity's major customers to help them effectively develop and deploy complex applications and systems that use the industry's highest-performing, most reliable DBMS technology, Objectivity/DB. He also liaises with technology partners and industry groups to help ensure that Objectivity/DB remains at the forefront of database and distributed computing technology. Leon has more than 35 years experience in the software industry. At Automation Technology Products, he managed the development of the ODBMS for the Cimplex solid modeling and numerical control system. Before that, he was Principal Project Director for International Computers Ltd. in the United Kingdom, delivering major projects for NATO and leading multinationals. He was also design and development manager for ICL's 2900 IDMS product. He spent the first 7 years of his career working in defense and government systems. Leon has a B.S. degree in Electronic Engineering from the University of Wales.

Statistics

Views

Total Views
843
Views on SlideShare
843
Embed Views
0

Actions

Likes
0
Downloads
32
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • By initiating a polyglot approach – One can utilize existing SQL based architecture and databases while still gaining the competitive advantage that the latest NOSQL technologies provide. One example of this Polyglot approach is shown here. The technology(ies) used would be dependent on the use case. <br />
  • Note Object Oriented Databases as NOSQL here. <br />

An Introduction to Graph Databases An Introduction to Graph Databases Presentation Transcript

  • www.Objectivity.com An Introduction To Graph Databases Leon Guzenda & Nick Quinn August 20, 2013
  • Overview • Introductions • Graph Theory • Commonly Used Graph Algorithms • Graph Databases • Current Implementations • Use Cases • Hands-On Tutorial
  • We Are From Objectivity Inc. Company • Objectivity, Inc. is headquartered in Sunnyvale, CA. • Established in 1988 to tackle database problems that network/hierarchical/relational and file-based technologies struggle with. • Objectivity has over two decades of Big Data and NoSQL experience Products • Develops NoSQL platforms for managing and discovering relationships and patterns in complex data: • Objectivity/DB - an object database that manages localized, centralized or distributed databases • InfiniteGraph - a massively scalable graph database built on Objectivity/DB that enables organizations to find, store and exploit the relationships in their data Markets • The Big Data market is projected to be around $12B in 2012, with a CAGR of 28% over the next five years. • 40% per year data growth, cloud adoption, mobile usage and improved real-time analytics underpin Objectivity’s growth opportunities as a Big Data analytics enabler. Customers • Embedded in hundreds of enterprises, government organizations and products - millions of deployments. Financials • Consistently generates increased revenues. • Privately held by the employees and a few venture capital companies. Copyright © Objectivity, Inc. 2012
  • GRAPH THEORY
  • The History of Graph Theory 1736: Leonard Euler writes a paper on the “Seven Bridges of Konisberg” 1845: Gustav Kirchoff publishes his electrical circuit laws 1852: Francis Guthrie poses the “Four Color Problem” 1878: Sylvester publishes an article in Nature magazine that describes graphs 1936: Dénes Kőnig publishes a textbook on Graph Theory 1941: Ramsey and Turán define Extremal Graph Theory 1959: De Bruijn publishes a paper summarizing Enumerative Graph Theory 1959: Erdos, Renyi and Gilbert define Random Graph Theory 1969: Heinrich Heesch solves the “Four Color” problem 2003: Commercial Graph Database products start appearing on the market
  • Graph Theory Terminology... VERTEX: A single node in a graph data structure EDGE: A connection between a pair of VERTICES PROPERTIES: Data items that belong to a particular Vertex or Edge WEIGHT: A quantity associated with a particular Edge GRAPH: A network of linked Vertex and Edge objects Vertex 1 City: San Francisco Pop: 812,826 Edge 1 Road: I-101 Miles: 47.8 Vertex 2 City: San Jose Pop: 967,487
  • ...Graph Theory Terminology... SIMPLE/UNDIRECTED GRAPH: A Graph where each VERTEX may be linked to one or more Vertex objects via Edge objects and each Edge object is connected to exactly two Vertex objects. Furthermore, neither Vertex connected to an Edge is more significant than the other. DIRECTED GRAPH: A Simple/Undirected Graph where one Vertex in a Vertex + Edge + Vertex group (an “Arc” or “Path”) can be considered the “Head” of the Path and the other can be considered the “Tail”. MIXED GRAPH: A Graph in which some paths are Undirected and others are Directed.
  • ...Graph Theory Terminology LOOP: An Edge that is doubly-linked to the same Vertex MULTIGRAPH: A Graph that allows multiple Edges and Loops QUIVER: A Graph where Vertices are allowed to be connected by multiple Arcs. A Quiver may include Loops. WEIGHTED GRAPH: A Graph where a quantity is assigned to an Edge, e.g. a Length assigned to an Edge representing a road between two Vertices representing cities.  HALF EDGE: An Edge that is only connected to a single Vertex  LOOSE EDGE: An Edge that isn't connected to any Vertices.  CONNECTIVITY: Two Vertices are Connected if it is possible to find a path between them.
  • COMMONLY USED GRAPH ALGORITHMS Mac Evans
  • Commonly Used Graph Algorithms... CONNECTEDNESS: Check whether or not a set of nodes in a Graph are connected. All of the nodes in the graph below are connected, e.g. A to B, A to C via B etc. SHORTEST PATH: The path between two nodes that visits the fewest intermediate nodes. In the graph above, A->B->C->D is shorter than A->B->C->B->D (disallowing loops) NODE DEGREE: The degree of a node in a network is a count of the number of connections it has to other nodes. The degree distribution is the probability distribution of these degrees in the whole network. In the graph below, A and D have a node degree of 1. B and C have a node degree of 3.
  • ...Commonly Used Graph Algorithms... CENTRALITY: An assessment of the importance of a node within a network. Degree Centrality is the simplest, being a count of the number of connections that a node has. It may be expressed as “Indegree” (# of incoming connections) and “Outdegre” (# of outgoing connections).
  • ...Commonly Used Graph Algorithms... CLOSENESS CENTRALITY: Closeness considers the shortest paths between nodes and assigns a higher value to nodes that can be used to reach most other nodes most quickly. In the graph below, node A has the greatest centrality as all other nodes can be reached in one “hop”, whereas others require 1 hop to A or 2 hops to any other node. A
  • Commonly Used Graph Algorithms... CONNECTEDNESS: Check whether or not a set of nodes in a Graph are connected. All of the nodes in the graph below are connected, e.g. A to B, A to C via B etc. SHORTEST PATH: The path between two nodes that visits the fewest intermediate nodes. In the graph above, A->B->C->D is shorter than A->B->C->B->D (disallowing loops) NODE DEGREE: The degree of a node in a network is a count of the number of connections it has to other nodes. The degree distribution is the probability distribution of these degrees in the whole network. In the graph below, A and D have a node degree of 1. B andC have a node degree of 3.
  • ...Commonly Used Graph Algorithms... SHORTEST PATH: The path between two nodes that visits the fewest intermediate nodes. In the graph below, A->B->C->D is shorter than A->B->C->B->D (disallowing loops) AVERAGE PATH LENGTH: The average of all path lengths between all pairs of nodes in a graph. TRANSITIVE CLOSURE: The process of exploring a graph by traversing relationships until all nodes have been visited, but without revisiting nodes that are joined together in loops. In the graph above, A->B->C->D is a transitive closure.
  • ...Commonly Used Graph Algorithms... GRAPH DIAMETER (or SPAN): The greatest distance between any pair of nodes in a graph. It is computed by finding the shortest path between each pair of nodes. The maximum of these path lengths is a measure of the diameter of the graph. The diameters of the two graphs below are 2 and 5.
  • ...Commonly Used Graph Algorithms... BETWEENESS CENTRALITY: A centrality measure of a node within a graph. Nodes that have a high probability of being visited on a randomly chosen short path between two randomly chosen nodes have a high “betweeness” In the graph below, node D has the highest betweeness centrality.
  • GRAPH DATABASES
  • Recognizing Graphs In Object Models... Tree Structures 1-to-Many Object Class A
  • ...Recognizing Graphs In Object Models... Tree Structures 1-to-Many Relationship Data Object Class A Object Class A
  • Recognizing Graphs In Object Models... Tree Structures 1-to-Many Relationship Data Object Class A Object Class A Graph (Network) Structures Many-to-Many Object Class A
  • Recognizing Graphs In Object Models... Tree Structures 1-to-Many Relationship Data Object Class A Object Class A Graph (Network) Structures Many-to-Many Relationship Data Object Class A Object Class A Copyright © Objectivity, Inc. 2012
  • Why Do We Need Graph DBMSs?... Relational Database Think about the SQL query for finding all links between the two “blue” rows... Good luck! Table_A Table_B Table_C Table_D Table_E Table_F Table_G Relational databases aren’t good at handling complex relationships!
  • ...Graph DBMSs Are Designed To Handle Relationships Relational Database Think about the SQL query for finding all links between the two “blue” rows... Good luck! Table_A Table_B Table_C Table_D Table_E Table_F Table_G Objectivity/DB or InfiniteGraph - The solution can be found with a few lines of code A3 G4
  • Graph Databases • Data model: – Node (Vertex) and Relationship (Edge) objects – Directed – May be a hypergraph (edges with multiple endpoints) • Examples: – InfiniteGraph, Neo4j, OrientDB, AllegroGraph, TitanDB and Dex VERTEX 2 N EDGE
  • Graph DBMSs Use A Very Simple Object Model Tree Structures 1-to-Many Relationship Data Object Class A Object Class A Graph (Network) Structures GRAPH MODEL Many-to-Many Relationship Data EDGE Object Class A Object Class A VERTEX Copyright © Objectivity, Inc. 2012
  • Basic Capabilities Of Most Graph Databases... Rapid Graph Traversal Start Finish
  • ...Basic Capabilities Of Most Graph Databases... Rapid Graph Traversal Inclusive or Exclusive Selection X Start Start X
  • ...Basic Capabilities Of Most Graph Databases Rapid Graph Traversal Inclusive or Exclusive Selection X Start Start X Find the Shortest or All Paths Between Objects Start Finish
  • InfiniteGraph Capabilities Parallel Graph Traversal Inclusive or Exclusive Selection X Start Start X Shortest or All Paths Between Objects Computational & Visualization Plug-Ins Compute Cost To Date Start Finish Start Visualize Copyright © Objectivity, Inc. 2013
  • CURRENT IMPLEMENTATIONS
  • Graph Databases Pre-2003
  • Graph Databases Post-2003 X Titan
  • Graph Databases Compared [UNSW] DATA STORAGE FEATURES
  • Graph Databases Compared [DZone] Source: http://goo.gl/ni4eoE
  • Graph Databases – Pros and Cons • Strengths: – Extremely fast for connected data – Scales out, typically – Easy to query (navigation) – Simple data model • Weaknesses: – May not support distribution or sharding – Requires conceptual shift... a different way of thinking VERTEX 2 N EDGE
  • USE CASES
  • Example 1 - Market Analysis The 10 companies that control a majority of U.S. consumer goods brands
  • Example 2 - Demographics Used in social network analysis, marketing, medical research etc.
  • Example 3 - Seed To Consumer Tracking ?
  • Example 4 - Ad Placement Networks Smartphone Ad placement - based on the the user’s profile and location data captured by opt-in applications. • The location data can be stored and distilled in a key-value and column store hybrid database, such as Cassandra • The locations are matched with geospatial data to deduce user interests. • As Ad placement orders arrive, an application built on a graph database such as InfiniteGraph, matches groups of users with Ads: • Maximizes relevance for the user. • Yields maximum value for the advertiser and the placer.
  • Example 4 - Ad Placement Networks Smartphone Ad placement - based on the the user’s profile and location data captured by opt-in applications. • The location data can be stored and distilled in a key-value and column store hybrid database, such as Cassandra • The locations are matched with geospatial data to deduce user interests. • As Ad placement orders arrive, an application built on a graph database such as InfiniteGraph, matches groups of users with Ads: • Maximizes relevance for the user. • Yields maximum value for the advertiser and the placer.
  • Example 5 - Healthcare Informatics Problem: Physicians need better electronic records for managing patient data on a global basis and match symptoms, causes, treatments and interdependencies to improve diagnoses and outcomes. • Solution: Create a database capable of leveraging existing architecture using NOSQL tools such as Objectivity/DB and InfiniteGraph that can handle data capture, symptoms, diagnoses, treatments, reactions to medications, interactions and progress. • Result: It works: • Diagnosis is faster and more accurate • The knowledge base tracks similar medical cases. • Treatment success rates have improved.
  • Example 6 - Big Data Analytics
  • Example 7 – Visual Analytics
  • Hands On With A Graph Database • We'll be using InfiniteGraph today • You'll need a Java Development environment on your machine • If you haven't downloaded InfiniteGraph already, please go to: http://goo.gl/XzJo6T [https://download.infinitegraph.com/index.aspx] • We'll be covering a HelloGraph and a more complex sample program