2. Overview
2
Connected (or Linked) Data and the NO-SQL
movement.
Modeling Linked Data:
Using the Graph data structure.
Property Graphs.
Graph-based Database:
Neo4j
Querying a graph-store, two schools of thought:
Traversal-based
Pattern-matched
4. Linked Data According to the W3C
Large scale integration of, and reasoning on,
data on the Web.
Standardized format:
Resource Description Framework (RDF)
Access to data:
XML, XHTML, etc.
Published relationships between data.
Query endpoints:
RDF, GRDDL, POWDER, RDFa, the upcoming
R2RML, RIF, SPARQL
4
@see: http://www.w3.org/standards/semanticweb/data
5. Modeling Linked Data: Graphs
“A Graph —records data in→ Nodes —which
have→ Properties”
Neo4j Graph: An object that contains vertices
and edges.
Element: An object that can have any number of
key/value pairs associated with it (i.e. properties)
Vertex: An object that has incoming and outgoing
edges.
Edge: An object that has a tail and head vertex.
5
@see: https://github.com/tinkerpop/blueprints/wiki/property-
graph-model
6. Modeling Linked Data: Property Graphs
A property graph has these elements:
a set of vertices
each vertex has a unique identifier.
each vertex has a set of outgoing edges.
each vertex has a set of incoming edges.
each vertex has a collection of properties defined by a map from
key to value.
a set of edges
each edge has a unique identifier.
each edge has an outgoing tail vertex.
each edge has an incoming head vertex.
each edge has a label that denotes the type of relationship
between its two vertices.
each edge has a collection of properties defined by a map from
key to value.
6
@see: https://github.com/tinkerpop/blueprints/wiki/property-
graph-model
10. NO-SQL (Not-only SQL) Movement
NO-SQL DEFINITION:
Next Generation Databases mostly addressing
some of the points: being non-relational,
distributed, open-source and horizontally
scalable.
Began in early 2009 and is growing rapidly.
Characterized by:
schema-free, ✔Neo4j
easy replication support, ✔Neo4j
simple API, ✔Neo4j
eventually consistent / BASE (not ACID), a huge
amount of data ✗Neo4j conforms to ACID!
…and more.
10
@see:http://nosql-database.org/
12. Neo4j
“An embedded, disk-based, fully
transactional Java persistence engine
that stores data structured in graphs
rather than in tables.”
– Neo Technologies
12
13. Neo4j: Introduction
Open-source codebase.
Baked-in licensing flexibility:
GPL: “If your app is free, Neo4j is free. If not,
there is a fee”.
Feb 2010 – v1.0 released.
Neo Technologies
CEO: Emil Eifrem (@see:
http://www.youtube.com/watch?v=q9m_5xiGrf4 )
13
14. Neo4j: Understanding the Architecture
14
@see: http://docs.neo4j.org/chunked/stable/tutorials.html
15. Neo4j: An Accessible API
15
Updating operations:
Transaction Wrapper (ACID):
@see: http://docs.neo4j.org/chunked/stable/tutorials.html
16. Neo4j: Querying via Pattern Matching
“A Traversal —navigates→ a Graph; it —
identifies→ Paths —which order→ Nodes”
Functional front-end manager application:
HTTP console (uses REST)
Cypher (a declarative graph query language)
Gremlin (an imperative, XPath-oriented, turing-
complete graph programming language)
Result visualizations
Query framework plugins for specific use cases:
e.g. SPARQL
16
@see: http://docs.neo4j.org/chunked/stable/tutorials.html
17. Neo4j: Querying with Cypher (in Java!)
17
@see: http://docs.neo4j.org/chunked/stable/tutorials-cypher-java.html
18. Neo4j: The Good
Server-side Plugins
SPARQL plugin for Semantic Pattern Matching of
RDF triples.
Caching plugins
Visualization plugins
Big-Data plugins
many more and growing…
18
19. Neo4j: The Bad
RESTful calls to standalone server are slow.
Cypher is a whole new language.
SPARQL support is in its infancy.
Scaling up requires some imaginative re-tooling
of the Property Graph model.
Indexing is limited to Nodes and Relationships.
19
http://www.scribd.com/doc/2670985/SQL-Antipatterns
Comma Separated Columns
Multi-Attribute Tables
Entity Attribute Value
Atomic: Everything in a transaction succeeds or the entire transaction is rolled back.
Consistent: A transaction cannot leave the database in an inconsistent state.
Isolated: Transactions cannot interfere with each other.
Durable: Completed transactions persist, even when servers restart etc.
Basic Availability
Soft-state
Eventual consistency
Rather than requiring consistency after every transaction, it is enough for the database to eventually be in a consistent state. (Accounting systems do this all the time. It’s called “closing out the books.”) It’s OK to use stale data, and it’s OK to give approximate answers.