Graph Databases
and Neo4j
Data is getting bigger:
“Every 2 days we create as much
information as we did up to 2003”
– Eric Schmidt, Google
NOSQL
Key Value
Stores
Most Based on Dynamo: Amazon Highly Available
Key-Value Store
Data Model:
Global key-value mapping
Big scalable Hash Map
Highly fault tolerant (typically)
Examples:
Redis, Riak, Voldemort
Pros & Cons
Pros:
Simple data model
Scalable
Cons:
Create your own “foreign keys”
Poor for complex data
Column Family
Most Based on Big Table: Google’s Distributed
Storage System for Structured Data
Data Model:
A big table, with column families
Map Reduce for querying/processing
Examples:
HBase, HyperTable, Cassandra
Pros & Cons
Pros:
Supports Simi-Structured Data
Naturally Indexed (columns)
Scalable
Cons:
Poor for interconnected data
Document
Databases
Data Model:
A collection of documents
A document is a key value collection
Index-centric, lots of map-reduce
Examples:
CouchDB, MongoDB
Pros & Cons
Pros:
Simple, powerful data model
Scalable
Cons:
Poor for interconnected data
Query model limited to keys and indexes
Map reduce for larger queries
Graph
Databases
Data Model:
Nodes and Relationships
Examples:
Neo4j, OrientDB, InfiniteGraph,
AllegroGraph
Pros & Cons
Pros:
Powerful data model, as general as RDBMS
Connected data locally indexed
Easy to query
Cons:
Requires rewiring your brain
Complexity
Big Table
Clones
Size
Key-Value
Store
Document
Databases
Graph
Databases
90% of
Use Cases
Relational
Databases
A Graph Database uses graph structure with nodes, edges
and properties to represent and store data.
By definition, a graph database is any storage system that
provides index-free adjacency. This means that every
element contains a direct pointer to its adjacent element
and no index lookups are necessary.
Graph databases focus on the interconnection between
Entities.
Graph Database definition
Compared with RDBMS
Graph databases are often faster for associative data sets
Map more directly to the structure of object-oriented
applications
Scale more naturally to large data sets as they do not typically
require expensive join operations.
As they depend less on a rigid schema, they are more suitable
to manage ad-hoc and changing data with evolving schemas.
Finding Extended Friends
Nodes
Nodes represent Entities such as people, businesses, accounts,
or any other item you might want to keep track of.
Properties
Properties are pertinent information that relate to nodes.
Edges
Edges are the lines that connect nodes to nodes or nodes to
properties and they represent the Relationship between the
two.
Most of the important information is really stored in the
edges.
Meaningful patterns emerge when one examines the
connections and interconnections of nodes, properties and
edges.
What is Neo4j?
• A Graph Database
• Property Graph
• Full ACID (atomicity, consistency, isolation, durability)
• High Availability (with Enterprise Edition)
• 32 Billion Nodes, 32 Billion Relationships,
64 Billion Properties
• Embedded Server
• REST API
Key Features
• Runs on major platforms : Mac | Windows | Unix
• Extensive documentation
• Active community
• Open Source
CYPHER
Cypher is a declarative graph query language that allows for
expressive and efficient querying and updating of the graph
store without having to write traversal through the graph
structure in code.
CYPHER
START: Starting points in the graph, obtained via index lookups or by element IDs.
MATCH: The graph pattern to match, bound to the starting points in START.
WHERE: Filtering criteria.
RETURN: What to return.
CREATE: Creates nodes and relationships.
DELETE: Removes nodes, relationships and properties.
SET: Set values to properties.
FOREACH: Performs updating actions once per element in a list.
WITH: Divides a query into multiple, distinct parts.

Graph Database and Neo4j

  • 1.
  • 2.
    Data is gettingbigger: “Every 2 days we create as much information as we did up to 2003” – Eric Schmidt, Google
  • 3.
  • 4.
    Key Value Stores Most Basedon Dynamo: Amazon Highly Available Key-Value Store Data Model: Global key-value mapping Big scalable Hash Map Highly fault tolerant (typically) Examples: Redis, Riak, Voldemort
  • 5.
    Pros & Cons Pros: Simpledata model Scalable Cons: Create your own “foreign keys” Poor for complex data
  • 6.
    Column Family Most Basedon Big Table: Google’s Distributed Storage System for Structured Data Data Model: A big table, with column families Map Reduce for querying/processing Examples: HBase, HyperTable, Cassandra
  • 7.
    Pros & Cons Pros: SupportsSimi-Structured Data Naturally Indexed (columns) Scalable Cons: Poor for interconnected data
  • 8.
    Document Databases Data Model: A collectionof documents A document is a key value collection Index-centric, lots of map-reduce Examples: CouchDB, MongoDB
  • 9.
    Pros & Cons Pros: Simple,powerful data model Scalable Cons: Poor for interconnected data Query model limited to keys and indexes Map reduce for larger queries
  • 10.
    Graph Databases Data Model: Nodes andRelationships Examples: Neo4j, OrientDB, InfiniteGraph, AllegroGraph
  • 11.
    Pros & Cons Pros: Powerfuldata model, as general as RDBMS Connected data locally indexed Easy to query Cons: Requires rewiring your brain
  • 12.
  • 13.
    A Graph Databaseuses graph structure with nodes, edges and properties to represent and store data. By definition, a graph database is any storage system that provides index-free adjacency. This means that every element contains a direct pointer to its adjacent element and no index lookups are necessary. Graph databases focus on the interconnection between Entities. Graph Database definition
  • 14.
    Compared with RDBMS Graphdatabases are often faster for associative data sets Map more directly to the structure of object-oriented applications Scale more naturally to large data sets as they do not typically require expensive join operations. As they depend less on a rigid schema, they are more suitable to manage ad-hoc and changing data with evolving schemas.
  • 15.
  • 16.
    Nodes Nodes represent Entitiessuch as people, businesses, accounts, or any other item you might want to keep track of.
  • 17.
    Properties Properties are pertinentinformation that relate to nodes.
  • 18.
    Edges Edges are thelines that connect nodes to nodes or nodes to properties and they represent the Relationship between the two. Most of the important information is really stored in the edges. Meaningful patterns emerge when one examines the connections and interconnections of nodes, properties and edges.
  • 21.
    What is Neo4j? •A Graph Database • Property Graph • Full ACID (atomicity, consistency, isolation, durability) • High Availability (with Enterprise Edition) • 32 Billion Nodes, 32 Billion Relationships, 64 Billion Properties • Embedded Server • REST API
  • 22.
    Key Features • Runson major platforms : Mac | Windows | Unix • Extensive documentation • Active community • Open Source
  • 23.
    CYPHER Cypher is adeclarative graph query language that allows for expressive and efficient querying and updating of the graph store without having to write traversal through the graph structure in code.
  • 24.
    CYPHER START: Starting pointsin the graph, obtained via index lookups or by element IDs. MATCH: The graph pattern to match, bound to the starting points in START. WHERE: Filtering criteria. RETURN: What to return. CREATE: Creates nodes and relationships. DELETE: Removes nodes, relationships and properties. SET: Set values to properties. FOREACH: Performs updating actions once per element in a list. WITH: Divides a query into multiple, distinct parts.

Editor's Notes

  • #5 Dynamo is a set of techniques Fault tolerant : it enables continue operating after of failure some of its coponents
  • #8 Interconnected data: dAde hAye be ham peyvaste