www.Objectivity.com

Getting Started
with Graph
Databases
Nick Quinn
Principal Engineer, InfiniteGraph

11/13/2013

1
What are we talking about today?
•Big Data and Databases
•What is a Graph Database?
•What is InfiniteGraph?
•Demo and Q&A ...
NoSQL 2013
• Developers are embracing choice
• More than Dynamo and BigTable clones
• Incorporates specialized data models...
NoSQL and BigData – What’s the Connection ?
big data is a loosely-defined term used to describe data sets
so large and com...
The Specialist !
• Everyone specializes
– Doctors, Lawyers, Bankers, Developers 

• Why was data so normalized for so lon...
Polyglot NoSQL Architectures
Users

Applications

RDBMS

Document

Graph
Database

6

Business

External / Legacy Data
11/...
NoSQL Landscape - How it all stacks up!
Data
Model

Performance

Scalability

Flexibility

Complexity

Functionality

Key–...
Navigational Query Performance

11/13/2013

8
The Physical Data Model
• Becoming a relationship specialist…
Rows/Columns/Tables

Relationship/Graph Optimized

Meetings
...
Sometimes Big Data is just Fast Data !
• Some data is only actionable momentarily
–
–
–
–

Intelligence
IT Security
Site/p...
Scaling Writes
• Big/Fast data demands write performance
• Most NoSQL solutions allow you to scale writes by…
– Partitioni...
Why a Graph Database ?

11/13/2013

12
Relationships are everywhere
CRM, Sales &
Marketing

Network
Mgmt,
Telecom

Intelligence
(Government
& Business)

PLM (Pro...
Exploding Connections
• More often than not… graphs are big !

11/13/2013

14
The Graph Database Landscape
• Neo4J
• Titan (Aurelius)
• AllegroGraph (RDF)
• FlockDB (Twitter)
• DEX (Sparsity)
• Orient...
The Graph Database Landscape Cont’d
• Graph Analytics: High latency, Batch Processing, offline
– Apache Giraph
– GraphLab
...
Why InfiniteGraph™?
• Objectivity/DB is a proven foundation

– Building highly connected databases since 1993
– A complete...
InfiniteGraph™ Basic Architecture
User Apps

Blueprints

InfiniteGraph - Core/API

Management
Extensions

Navigation
Execu...
Fully Distributed Data Model
AddVertex()
IG Core/API
ADP Placement

Distributed Object and Relationship Persistence Layer
...
InfiniteGraph is a Complete Database
• InfiniteGraph helps manage the things you don’t want to do, but
want to have done:
...
Scaling Graph Writes
App-2
App-2
(Ingest V2)
(E23{ V2V3})

App-1
(E1 2{ V1V2})
(Ingest V1)

App-3
(Ingest V3)

InfiniteGra...
High Performance Edge Ingest
IG Core/API

E23

E(2->1)

E(1->2)

E(2->3)

E(2->3)
E(3->1)

E(1->2)
E(3->2)

11/13/2013

Pi...
Result…

500000
450000

Nodes and Edges per second

400000
350000
1 client

300000

2 clients

250000

4 clients

200000
8...
Scaling Reads and Query
Partitioning and Read Replicas… easy right !
Application(s)

Distributed API

Processor

Processor...
Why are Graphs Different ?
Application(s)

Distributed API

Processor

Processor

Processor

Processor

Partition 1

Parti...
Optimizing Distributed Navigation
• Detect local hops and perform in memory
traversal
– Intelligently cache freq accessed ...
Super Simple API
Person alice = new Person(“Alice”);
helloGraphDB.addVertex( alice );
Person bob = new Person(“Bob”);
hell...
Adding Edges
MyEdgeType edge = new MyEdgeType();
vertexA.addEdge ( edge, vertexB, EdgeKind.???, weight );

Meeting denverM...
The Result…

11/13/2013

29
Graph Traversal (Navigation) Queries
• Use an instance of the Navigator class to perform a
navigation query.
• A navigatio...
Schema – It’s not your enemy ! (well not all the time...)
• Schema vs Schema-less
–
–
–
–

Database religion
No time for a...
Graph Views and Bacon!
•

Filter out uninteresting projects connected to Kevin Bacon
GraphView view = new GraphView();
//E...
Tools To Suit the Solution

11/13/2013

33
Demo
 Installing InfiniteGraph
 FlightPlan Sample
Upcoming SlideShare
Loading in...5
×

Getting Started with Graph Databases

1,937

Published on

Exploiting graph database to discover value in complex Big Data. Lunch will be provided while you discover the power of graph database technology for your Big Data needs.

Bring your charged laptops to this upcoming meetup to walk through how to get started with InfiniteGraph. Nick Quinn, Senior Software Developer for InfiniteGraph, will walk you through the initial installation of InfiniteGraph and the HelloGraph sample to get you started with your graph database. Download InfiniteGraph for free here: http://www.objectivity.com/downloads

Once we get through the tutorial, there will be time for Q&A and more hands on support from additional members of the InfiniteGraph technical team.

If you have a complex Big Data problem and are looking to discover deeper connections and relationships within your data to create next-generation applications for social networks, healthcare, finance, telecom and security this is a must attend event! Get started quickly with our enterprise proven, massively scalable and distributed graph database!

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,937
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
61
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Kevin Norwood Bacon, 1958 in Pennsylvania
  • Relationships and connections are EVERYWHERE. Examples include CRM, Telecom, Intelligence, Research, Healthcare, Finance and yes, social networks too. But notice, it’s absolutely not just about social networks, in the Facebook sense. ANY application that needs to find connections and relationships separated by more than 2 degrees, is a good candidate for InfiniteGraph.
  • SIMPLE_BREADTH_FIRSTTraversal from a given vertex proceeds to all related vertices that are one degree of separation out before backtracking to traverse to related vertices that are two degrees of separation out, and so forth.SIMPLE_DEPTH_FIRSTTraversal from a given vertex continues down a path until it reaches an endpoint before backtracking to the originating vertex to check for additional outgoing paths, and so forth.
  • Getting Started with Graph Databases

    1. 1. www.Objectivity.com Getting Started with Graph Databases Nick Quinn Principal Engineer, InfiniteGraph 11/13/2013 1
    2. 2. What are we talking about today? •Big Data and Databases •What is a Graph Database? •What is InfiniteGraph? •Demo and Q&A – Hands On – Installing InfiniteGraph • https://download.infinitegraph.com – FlightPlan Sample • http://wiki.infinitegraph.com  “Download Examples”  FlightPlanSample.zip Images Courtesy of IMDB (www.imdb.com)
    3. 3. NoSQL 2013 • Developers are embracing choice • More than Dynamo and BigTable clones • Incorporates specialized data models like Document, Object and Graph • 100+ projects and products (Wikipedia) • ~250 Meetup.com Groups (5 meetups this week!) • NoSQL fans consume 12% of the worlds Beer & Pizza 11/13/2013
    4. 4. NoSQL and BigData – What’s the Connection ? big data is a loosely-defined term used to describe data sets so large and complex that they become awkward to work with using on-hand database management tools (wikipedia) • • • • • • Making big data “appear” smaller Partitioning, replication & distributed query Storage model optimizations Consistency trade offs Simplified query models Dynamic views 11/13/2013 4
    5. 5. The Specialist ! • Everyone specializes – Doctors, Lawyers, Bankers, Developers  • Why was data so normalized for so long ! • NoSQL is all about the data specialist • Specializing in… – – – – 11/13/2013 Distribution / deployment Physical data storage Logical data model Query mechanism 5
    6. 6. Polyglot NoSQL Architectures Users Applications RDBMS Document Graph Database 6 Business External / Legacy Data 11/13/2013 Distributed Data Processing Platform Transformation MDM Partitioned Distributed DB (often Document / KV)
    7. 7. NoSQL Landscape - How it all stacks up! Data Model Performance Scalability Flexibility Complexity Functionality Key–value Stores high high high none variable (none) Column Store high high moderate low minimal Document Store high variable high low variable (low) Graph Database variable variable high high graph theory Relational Database variable variable low moderate relational algebra. From…http://wikipedia.org/wiki/NoSQL 11/13/2013 7
    8. 8. Navigational Query Performance 11/13/2013 8
    9. 9. The Physical Data Model • Becoming a relationship specialist… Rows/Columns/Tables Relationship/Graph Optimized Meetings P1 Alice P2 Bob Place Denver Time 5-27-10 Alice Met 5-27-10 Charlie Calls From Bob Bob To Carlos Charlie Time 13:20 17:10 Duration 25 15 Called 13:20 Called 17:10 Carlos Bob Paid 100000 Payments From Date Amount Carlos 11/13/2013 To Charlie 5-12-10 100000 9
    10. 10. Sometimes Big Data is just Fast Data ! • Some data is only actionable momentarily – – – – Intelligence IT Security Site/page visit Financial / trading behavior • Presents a different type of challenge • Latency of batch data processing becomes problematic 11/13/2013 10
    11. 11. Scaling Writes • Big/Fast data demands write performance • Most NoSQL solutions allow you to scale writes by… – Partitioning the data – Understanding your consistency requirements – Allowing you to defer conflicts 11/13/2013 11
    12. 12. Why a Graph Database ? 11/13/2013 12
    13. 13. Relationships are everywhere CRM, Sales & Marketing Network Mgmt, Telecom Intelligence (Government & Business) PLM (Product Lifecycle Mgmt) Finance Social Networks 11/13/2013 Healthcare Research: Genomics 13
    14. 14. Exploding Connections • More often than not… graphs are big ! 11/13/2013 14
    15. 15. The Graph Database Landscape • Neo4J • Titan (Aurelius) • AllegroGraph (RDF) • FlockDB (Twitter) • DEX (Sparsity) • OrientDB (Document) • + 24 others (from wikipedia.org) Copyright © InfiniteGraph
    16. 16. The Graph Database Landscape Cont’d • Graph Analytics: High latency, Batch Processing, offline – Apache Giraph – GraphLab – Intel’s Graph Builder • Visual Analytics: In Memory, High Performance, Poor Scalability – – – – Tom Sawyer D3JS KeyLines InfoVis • Tinkerpop stack (Blueprints/Gremlin) – 16 implementations and counting… Copyright © InfiniteGraph
    17. 17. Why InfiniteGraph™? • Objectivity/DB is a proven foundation – Building highly connected databases since 1993 – A complete database management system • Concurrency, transactions, cache, schema, query, indexing • It’s a Graph Specialist ! – Simple but powerful API tailored for navigation through data – Easy to configure distribution model 11/13/2013 17
    18. 18. InfiniteGraph™ Basic Architecture User Apps Blueprints InfiniteGraph - Core/API Management Extensions Navigation Execution Placement Session / TX Management Configuration Distributed Object and Relationship Persistence Layer 11/13/2013 18
    19. 19. Fully Distributed Data Model AddVertex() IG Core/API ADP Placement Distributed Object and Relationship Persistence Layer HostA HostB HostC Zone 1 11/13/2013 HostX Zone 2 19
    20. 20. InfiniteGraph is a Complete Database • InfiniteGraph helps manage the things you don’t want to do, but want to have done: – Concurrency • Transactions (commit/rollback) • Controlled multi-user reading during updates – Schema Control • Build complex data structures, make changes easily and migrate existing data – Distribution • Sharing large amounts of distributed data between distributed processes – Indexes • Choose built-in key-value, b-tree or other indexes – Cache • Keep large sections of the graphs in configurable memory caches 11/13/2013 20
    21. 21. Scaling Graph Writes App-2 App-2 (Ingest V2) (E23{ V2V3}) App-1 (E1 2{ V1V2}) (Ingest V1) App-3 (Ingest V3) InfiniteGraph Objectivity/DB Persistence Layer V1 E12 Copyright © InfiniteGraph V2 E23 V3
    22. 22. High Performance Edge Ingest IG Core/API E23 E(2->1) E(1->2) E(2->3) E(2->3) E(3->1) E(1->2) E(3->2) 11/13/2013 Pipeline E(1->2) E(3->1) Target Containers E12 E(2->3) E(2->1) E(2->3) E(3->1) E(3->1) E(3->2) 22 C1 Pipeline Containers E(1->2) C2 Agent C3
    23. 23. Result… 500000 450000 Nodes and Edges per second 400000 350000 1 client 300000 2 clients 250000 4 clients 200000 88 Hosts clients 150000 44 Hosts clients 100000 50000 22 Hosts clients 0 1 Single 1 clientHost 2 4 11/13/2013 23 8 clients
    24. 24. Scaling Reads and Query Partitioning and Read Replicas… easy right ! Application(s) Distributed API Processor Processor Processor Processor Partition 1 Partition 2 Partition 3 Partition ...n Copyright © InfiniteGraph
    25. 25. Why are Graphs Different ? Application(s) Distributed API Processor Processor Processor Processor Partition 1 Partition 2 Partition 3 Partition ...n 11/13/2013 25
    26. 26. Optimizing Distributed Navigation • Detect local hops and perform in memory traversal – Intelligently cache freq accessed remote data • Route tasks to other hosts when it is optimal Application Distributed API Processor Processor A C B X F D P(A,B,C,D) E Y Partition 1 11/13/2013 Partition 2 26 G
    27. 27. Super Simple API Person alice = new Person(“Alice”); helloGraphDB.addVertex( alice ); Person bob = new Person(“Bob”); helloGraphDB.addVertex( bob ); Person carlos = new Person(“Carlos”); helloGraphDB.addVertex( carlos ); Person charlie = new Person(“Charlie”); helloGraphDB.addVertex( charlie ); 11/13/2013 27
    28. 28. Adding Edges MyEdgeType edge = new MyEdgeType(); vertexA.addEdge ( edge, vertexB, EdgeKind.???, weight ); Meeting denverMeeting = new Meeting("Denver", "5-27-10"); alice.addEdge(denverMeeting, bob, EdgeKind.BIDIRECTIONAL, (short)1); Call bobToCarlos = new Call(getRandomJulyTime()); bob.addEdge(bobToCarlos, carlos, EdgeKind.OUTGOING, (short)0); Payment payment = new Payment(10000.00); carlos.addEdge(payment, charlie, EdgeKind.OUTGOING, (short)2); Call bobToCharlie = new Call(getRandomJulyTime()); bob.addEdge(bobToCharlie, charlie, EdgeKind.INCOMING, (short)0); 11/13/2013 28
    29. 29. The Result… 11/13/2013 29
    30. 30. Graph Traversal (Navigation) Queries • Use an instance of the Navigator class to perform a navigation query. • A navigation instance is highly customizable, but is comprised of the following basic parts: – The vertex from which to start the navigation query. – A guide strategy, which is a high-level navigational aid. You can create a custom guide, or there are several available built-in guide strategies. • Guide.Strategy.NONE • Guide.Strategy.SIMPLE_BREADTH_FIRST • Guide.Strategy.SIMPLE_DEPTH_FIRST – Qualifiers • A path qualifier • A result qualifier – Handlers • A result handler 11/13/2013 30
    31. 31. Schema – It’s not your enemy ! (well not all the time...) • Schema vs Schema-less – – – – Database religion No time for a full debate here InfiniteGraph supports schema Planning to also support optional properties on schema types • Graph Views : A Great Use Case for Schema! – Filter by type and predicate during navigation – Connection Inference! 11/13/2013 31
    32. 32. Graph Views and Bacon! • Filter out uninteresting projects connected to Kevin Bacon GraphView view = new GraphView(); //Excludes all instances of TvShow from navigation view.excludeClass(myDb.getTypeId(TvShow.class.getName())); //Excludes all movies made for TV/Video view.excludeClass(myDb.getTypeId(Movie.class.getName()), “de tails.madeForTv || details.madeForVideo”); //Include ActedIn w/ characterName not containing “Himself” view.excludeClass(myDb.getTypeId(WorkedOn.class.getName())); view.includeClass(myDb.getTypeId(ActedIn.class.getName()), “!CONTAINS(characterName, “Himself”)”); Movie Ryan Hardy TV Show The Following Actor Himself Kevin Bacon Jack Swigert Movie Apollo 13 Behind the Scenes
    33. 33. Tools To Suit the Solution 11/13/2013 33
    34. 34. Demo  Installing InfiniteGraph  FlightPlan Sample
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×