SlideShare a Scribd company logo
1 of 69
Download to read offline
Graph-like power
Roman R.
MATCH (a:Actor),(m:Movie)
WHERE a.name ='Keanu Reeves'
AND m.title='The Matrix'
CREATE (actor)-[:ACTS_IN]->(movie)
Today
○ Graphs in NoSQL world
○ classification
○ definition
○ components
○ Neo4j
○ nodes, rels, props, indexes
○ Cypher
○ PHP and Neo4j
○ Demo
○ Alternatives
○ Q/A
1
NoSQL Databases
Key-Value
Document
Graph
Column
(BigTable
)
MemcacheDB
Redis
Riak
Cassandra
CouchDB
Neo4j
TITAN
HBase/Hadoop
OrientDB
2
Elasticsearch
RavenDB
Tokyo Cabinet
Infinite GraphAllegroGraph
NoSQL
MongoDB
What is a Graph in math
3
● represent a connected set of objects
● graph:
○ vertex (node/points)
○ edge (arc/line/relationship/arrow) - undirected
○ attribute (property) - on node/relationship
● types:
○ pair: G = (V, E)
○ digraph: D = (V, A)
○ mixed: G = (V, E, A)
V = {1, 2, 3, 4, 5, 6}
E = {{1, 2}, {1, 5}, {2, 3}, {2, 5}, {3, 4}, {4, 5}, {4, 6}}
What is a Graph database
4
● stores data in a graph and retrieving vast networks of data
● shines when storing richly-connected data
● consists of nodes, connected by relationships
○ A Graph —records data in→ Nodes —which have→ Properties
○ Nodes —are organized by→ Rels —which also have→ Properties
○ Nodes —are grouped by→ Labels —into→ Sets
○ A Traversal —navigates→ a Graph
it —identifies→ Paths —which order→ Nodes
○ An Index —maps from→ Properties —to either→ Nodes or Rels
○ A Graph Database —manages a→ Graph and
—also manages related→ Indexes
Nodes, Rels, Props, Labels
5
A Graph
—records data in→ Nodes
—which have→ Properties
Nodes
—are organized by→ Relationships
—which also have→ Properties
Nodes
—are grouped by→ Labels
—into→ Sets
Graph Traversal
6
A Traversal
—navigates→ a Graph
it
—identifies→ Paths
—which order→ Nodes
what music
do my friends like
that I don’t yet own
if this power supply goes down,
what web services
are affected?
Graph Index
7
An Index
—maps from→ Properties
—to either→ Nodes or Rels
find the Account
for username master-of-graphs
Graph
8
A Graph Database
—manages a→ Graph and
—also manages related→ Indexes
How looks Graph database
9
A Graph Database transforms a RDBMS
10
A Graph Database elaborates a Key-Value Store
11
K* = key
V* = value
A Graph Database relates Column-Family
12
● BigTable databases are an evolution of key-value,
using "families" to allow grouping of rows
● stored in a graph, the families could become
hierarchical, and the relationships among data
becomes explicit
A Graph Database navigates a Document Store
13
D=Document,
S=Subdocument,
V=Value,
D2/S2 = reference
NoSQL Data Models
14
90% of all use cases
Relational Databases
15
● intuitive, using a graph model for data representation
● reliable, fully transactional, upholds ACID
● durable and fast, using a custom disk-based, native storage engine
● massively scalable, up to several billion nodes/relationships/properties
● highly-available, when distributed across multiple machines
● expressive, with a powerful, human readable declarative graph query
language
● fast, with a powerful traversal framework for high-speed graph queries
● embeddable, with a few small jars
● simple, accesible by a convenient REST API interface or an object-
oriented JAVA API
● indexes are based on Apache Lucene, supports Secondary Indexes
● has been in commercial development for 10 years and in production for
over 7 years; since 2003;
● Cross-platform; Simple set-up; Well documented; Open source;
● GPL for Community, AGPL for Enterprise
16
Neo4j features
● CPU - Intel Core i3/i7
● Memory - 2GB .. 16/32GB
● Disk - 10GB SATA .. SSD w/ SATA
● Filesystem - ext4 .. ext4/ZFS
● Software - Oracle JAVA 7
17
Neo4j requirements
● Neo4j Community
○ Open-Source High Performance
○ fully ACID transactional graph database
● Neo4j Enterprise
○ High-Performance Cache (up to 10x faster)
○ Horizontal scalability with Neo4j Clustering (predictable scalability)
○ High-availability and online backups
○ Cache based sharding (shard your graph in memory)
○ Advanced Monitoring (operational metrics)
○ Certified for Windows and Linux
○ Email/Phone Support (10x5, 24x7 hours)
○ Subscriptions
■ Personal (up to 3 devs, $100k annual revenue) = FREE
■ Startups (<$10M funding, <$5M annual revenue) = $12k
■ Business (medium, to Global 2000) = Contact Sales
18
Neo4j license
19
● for the simple friends of friends query, Neo4j is 60% faster than MySQL
● for friends of friends of friends, Neo is 180 times faster
● and for the depth four query, Neo4j is 1,135 times faster
● and MySQL just chokes on the depth 5 query
Neo4j vs. Mysql
Neo4j: Nodes
● fundamental units that form a graph
● can have key/value-style properties
● index nodes and relationships
by {key, value} pairs
● represent entities
20
Neo4j: Relationships #1/2
● connect entities and structure domain
● allow for finding related data
● are always directed (outgoing or incoming)
● are equally well traversed in either direction
● can have relationships to itself
● have a relationship type (label)
21
Neo4j: Relationships #2/2
22
Neo4j: Properties
● nodes and relationships can have properties
● are key-value pairs
○ key is a string
○ values can be either a primitive or an array of
one primitive type
■ boolean, String, int, int[], etc
■ Java Language Specification
● entity attributes, rels qualities,
and metadata
23
Neo4j: Labels
● used to group nodes into sets
● any number of labels, including none
● can be added and removed during runtime
● can be used to mark temporary states for nodes
● names case-sensitive
● CamelCase (convention)
24
Neo4j: Paths
● is one or more nodes with connecting relationships
● shortest path:
● a path of length one:
● a path of length one:
25
Neo4j: Traversal
● Traversal Framework from box
● means visiting nodes, following relationships by rules
● in most cases only a subgraph is visited
● callback based traversal API
○ you can specify the traversal rules
● traversing breadth- or depth-first
● open Java API
26
Neo4j: graph algorithms
● A* (> uses the A* algorithm to find the cheapest path between two
nodes)
● Dijkstra (dijkstra > Dijkstra algorithm to find the cheapest path
between two nodes)
● PathWithLength (> all paths of a certain length (depth)
between two nodes)
● Shortest paths (shortestPath Default > find all the
shortest paths between two nodes)
● All simple paths (allSimplePaths > find all simple paths
between two nodes; without loops;)
● All paths (allPaths > find all available paths between two
nodes)
27
Neo4j: Schema
● is schema-optional graph database
28
● introduced in Neo4j 2.0
● eventually available (populating in the background, is
not immediately available for querying)
○ come online after fully populated
○ failed status (drop and recreate the index)
● can be created on labels group
● indexed Nodes & Rels
● node_auto_indexing=false,
node_keys_indexable
Neo4j: Index
29
Neo4j: Constraints
● can help you keep your data clean
● specify the rules for what your data should
look like
● unique constraints is the only available
constraint type
30
● single server instance
○ nodes = 2^35 (~34 billion)
○ relationships = 2^35 (~34 billion)
○ labels = 2^31 (~2 billion)
○ properties = 2^36 to 2^38 depending on
property types (maximum ~274 billion, always
at least ~68 billion)
○ relationship types = 2^15 (~ 32’000)
31
Neo4j: Data Size
● powerful graph query language
● relatively simple
● declarative grammar (say what you want, not how)
● humane query language
● self-explanatory (based on English prose and neat iconography)
● written in Scala
● pattern-matching (borrows expression approaches from SPARQL)
● aggregation, ordering, limits
● create, update, delete
● structure and most of keywords inspired by SQL
● changing rather rapidly (CYPHER 1.9 START ...)
Cypher Query Language
32
“Makes the simple things easy, and the complex things possible”
Cypher patterns #1/2
33
● (a)
● (b)
● (a)-->(b)
● (a)-->(b)-->(c)
● (b)-->(c)<--(a)
● (b)-->()<--(a)
● (a)--(b)
● (a)-(*5)->(b)
● (a)-(*3..5)->(b)
○ (a)-(*3..)->(b)
○ (a)-(*..5)->(b)
○ (a)-(*)->(b)
Cypher patterns #2/2
34
● (a:Label)-->(m)
● (a:User:Admin)-->(m)
● (a)--(m)
● (a)-[r]->(m)
● (a)-[ACTED_IN]->(m)
● (a)-[r:SOME|ELSE|WTH]->(m)
Cypher: START / RETURN
“It all starts with the START”
Michael Hunger, Cypher webinar, Sep 2012
● designates the start points
● START is optional (in Neo4j >= 2.0)
Examples:
● START <lookup> RETURN <expression>
● START n=node(0) RETURN n
● START n=node(*) RETURN n.name
35
Cypher: MATCH
● primary way of getting data from the database
● START <lookup> MATCH <pattern> RETURN <expr>
● OPTIONAL MATCH <lookup> RETURN <expr>
Examples:
● MATCH (n) RETURN count(n)
● MATCH (actor:Actor) RETURN actor.name;
● START me=node(0) MATCH (me)--(f) RETURN f.name
● MATCH (n)-[r]->(m) RETURN n AS FROM, r AS `->`, m AS TO
36
● creates nodes and relationships
● CREATE (<name>[:label] [properties,..])
● CREATE (<node-in>)-[<var>:RELATION [properties,..]]->(<node-out>);
● CREATE UNIQUE ...
Examples:
● CREATE (n:Actor { name:"Keanu Reeves" });
● CREATE (keanu)-[:ACTED_IN]->(matrix)
● MATCH (keanu {name:”..”}) SET keanu.age=49 RETURN
Cypher: CREATE / SET
37
Cypher: WHERE
● filters the results
● MATCH <pattern> WHERE <condition> RETURN <expr>
Examples:
● WHERE n.name =~ “(?i)John.*”
● WHERE NOT ..
● WHERE type(rel) =~ “Perso.*”
38
Cypher: RETURN
● creates the result table
● any query can return data
● can be nodes, relationships, or properties on these
● RETURN DISTINCT <expression> AS x
● RETURN aggregate(expr) as alias
● RETURN nodes, rels, properties
● RETURN expressions of funcs and operators
● RETURN aggregation funcs on the above
39
Cypher: etc
● CASE / WHEN / ELSE
● ORDER BY node.key, node2.key, .. ASC|DESC
● LIMIT / SKIP
● WITH (WITH count(*) as c)
● UNION / UNION ALL (combining results from multiple queries)
● USING INDEX/SCAN
● MERGE / SET / DELETE / REMOVE / FORECH
● Expressions
● Operators
● Comments
● Functions: ALL, ANY, LENGTH, {Math}, {String}, ...
40
● any updating query will run in a transaction
● ACID
● “it is very important to finish each transaction”
● write lock on node/rel:
○ adding, changing or removing prop on a node/rel
● write lock on node:
○ creating or deleting a node
● write lock on node and both its nodes:
○ creating or deleting a relationship
Cypher: Transactions
41
Cypher: Aggregation
● count(node/rel/prop)
● count(n), count(n.prop)
● sum(n.prop)
● avg(n.prop)
● percentileDisc(n.prop, {median})
● stdev(n.prop, {median}) - calculate deviation from group
● max(n.prop, {median})
● collect(n.prop, {median})
● RETURN n, count(*)
42
● SELECT *
FROM Person
WHERE name=“Valentin” and age > 30
● START person=node:Person(node=”Valentin”)
WHERE person.age > 30
RETURN person
Cypher: back to SQL #1/5
43
Cypher: back to SQL #2/5
● SELECT “Email”.*
FROM Person
JOIN “Email” ON “Person”.id = “Email”.person_id
WHERE “Person”.name = “Benedikt”
● START person=node:Person(name=”Benedikt”)
MATCH person-[:email]->email
RETURN email
44
Cypher: back to SQL #3/5
● show me all people that are both actors and
directors
● SELECT name FROM Person
WHERE
person_id IN (SELECT person_id FROM Actor) AND
person_id IN (SELECT person_id FROM Director)
● START person=node:Person(“name:*”)
WHERE (person)-[:ACTS_IN]->()
AND (person)-[:DIRECTED]->()
RETURN person.name
45
Cypher: back to SQL #4/5
● show me all Tom Hanks’s co-actors
● SELECT DISTICT co_actor.name FROM Person tom
JOIN Movie a1 ON tom.person_in = a1.person_id
JOIN Actor a2 ON a1.movie_id = a2.movie_id
JOIN Person co_actor ON co_actor.person_id = a2.person_id
WHERE tom.name = “Tom Hanks”
● START tom=node:Person(name=”Tom Hanks”)
MATCH tom-[:ACTS_IN]->movie,
co_actor-[:ACTS_IN]->movie
RETURN DISTINCT co_actor.name
46
Cypher: back to SQL #5/5
● show me all Lucy’s favorite directors
● SELECT dir.name, count(*) FROM Person lucy
JOIN Actor on Person.person_id = Actor.person_id
JOIN Director ON Actor.movie_id = Director.movie_id
JOIN Person dir ON Director.person_id = dir.person_id
WHERE lucy.name = “Lucy Liu”
GROUP BY dir.name
ORDER BY count(*) DESC
● START lucy=node:Person(name=”Lucy Liu”)
MATCH lucy-[:ACTS_IN]->movie,
director-[:DIRECTED]->movie
RETURN director.name, count(*)
ORDER BY director.name, count(*) DESC
47
START
lucy = node:Person(name=”Lucy Lui”),
kevin = node:Person(name=”Kevin Bacon”)
MATCH
p = shortestPath( lucy-[:ACTS_IN*]-kevin )
RETURN
EXTRACT (n in NODES(p):
COALESCE(n.name?, n.title?))
48
Cypher: back to SQL #6/5
Neo4j Shell
● command-line shell for running Cypher queries
● supports remote shell
● :schema
● bash# neo4j-shell -path data/graph.db -readonly
-config conf/neo4j.properties
-c “<command>”
49
Neo4j: Security
● does not deal with data encryption
explicitly
● can be used all means built into the Java
● can be used encrypted datastore
● webadmin https
50
● manipulate data stored in RDF format
● focused on match triple sets
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?email
WHERE {
?person a foaf:Person.
?person foaf:name ?name.
?person foaf:mbox ?email.
}
SPARQL
51
● graph traversal language
● scripting language
● Pipe & Filter (similar to jQuery)
● across different graph databases
● based on Groovy (limited to Java)
● not as stable in Neo4j
● XPath like
● ./outE[label=”family”]/inV/@name
● g.v(1).out('likes').in('likes').out('likes').groupCount(m)
● g.V.as('x').out.groupCount(m).loop('x'){c++ < 1000}
● g.v(1).in(‘LOVE_OF’).out(‘SOME_IN’).has(‘title’,’abc’).back(2)
Gremlin
52
Neo4j and PHP
● everyman/neo4jphp < packagist.org
○ PHP wrapper for the Neo4j using REST interface
○ Follows the PSR-0 autoloading standard
○ Basic wrappers for all components
○ Last update - a month ago
○ supports Gremlin
● Neo4j-PHP OGM < a lot of based on
○ Object Graph Mapper, inspired by Doctrine
○ based on DoctrineCommon
○ borrows significantly DoctrineORM design
○ uses annotations on classes
○ MIT Licence
● Neo4J PHP REST API client
○ Using Neo4j REST API
○ Node create/find/delete
○ Relationship create/list/filter
53
High Availability with Neo4j
● in HA - a single master and zero or more slaves
● slave synchronizing with the master to preserve
consistency
● master write to slave before transaction completes
54
Demo
Neo4j.org Example Datasets:
● DrWho (nodes=1'060; rels=2'286)
● Cineasts Movies & Actors (nodes=64'069; rels=121'778)
● Hubway Data Challenge (nodes=554'674; rels=2'011'904)
GraphGist:
● JIRA and neo4j
● PHP and neo4j
● Kant in neo4j
XSS
55
Gephi (win, nix, mac)
56
Linkurious.us
57
Neoclipse (eclipse plugin)
58
KeyLines (JavaScript library)
59
Graffeine (npm package)
60
Neovigator (neography + processing.js)
61
● Heroku
○ GrapheneDB beta
○ bash$ heroku addons:add graphenedb
● Jelastic Cloud PaaS
Cloud
62
● GrapheneDB - based on neo4j
● AllegroGraph - Closed Source, Commercial, RDF-QuadStore
● Sones - Closed Source, .NET focused
○ graph database built around the W3C spec for the Resource
Description Framework
○ supports SPARQL, RDFS++, and Prolog
● Virtuoso - Closed Source, RDF focused
● GraphDB - graph database built in .NET by the German company sones
● InfiniteGraph - goal is to create a graph database with "virtually
unlimited scalability."
● FlockDB
Analogues
63
Docs
● http://docs.neo4j.org/chunked/snapshot/
● http://docs.neo4j.org/refcard/2.0/
● http://graphdatabases.com/ - book, O'REILLY
● http://www.cs.usfca.
edu/~galles/visualization/Algorithms.html - Graph
Algorithms visualization
● http://bit.ly/rr-neo4j
● https://github.com/itspoma/test-neo4j
64
● best used for graph-style,
rich or complex,
structured dense data,
deep graphs with unlimited depth and cyclical,
with weighted connections,
interconnected data
● quickly add new functionality without impacting
existing deployments
● schema-less forcing to re-think entire approach to data
● not the silver bullet for all problems
Conclusion
Neo4j: Graph-like power
Neo4j: Graph-like power

More Related Content

What's hot

Big Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and ClojureBig Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and Clojure
Dr. Christian Betz
 
MongoDB - A Document NoSQL Database
MongoDB - A Document NoSQL DatabaseMongoDB - A Document NoSQL Database
MongoDB - A Document NoSQL Database
Ruben Inoto Soto
 
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Holden Karau
 

What's hot (20)

Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
 
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
 
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop seriesIntroducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at last
 
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
 
Spark ML for custom models - FOSDEM HPC 2017
Spark ML for custom models - FOSDEM HPC 2017Spark ML for custom models - FOSDEM HPC 2017
Spark ML for custom models - FOSDEM HPC 2017
 
Beyond shuffling - Scala Days Berlin 2016
Beyond shuffling - Scala Days Berlin 2016Beyond shuffling - Scala Days Berlin 2016
Beyond shuffling - Scala Days Berlin 2016
 
Big Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and ClojureBig Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and Clojure
 
A super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAMA super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAM
 
MongoDB - A Document NoSQL Database
MongoDB - A Document NoSQL DatabaseMongoDB - A Document NoSQL Database
MongoDB - A Document NoSQL Database
 
Scaling with apache spark (a lesson in unintended consequences) strange loo...
Scaling with apache spark (a lesson in unintended consequences)   strange loo...Scaling with apache spark (a lesson in unintended consequences)   strange loo...
Scaling with apache spark (a lesson in unintended consequences) strange loo...
 
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
 
Beyond Shuffling - Effective Tips and Tricks for Scaling Spark (Vancouver Sp...
Beyond Shuffling  - Effective Tips and Tricks for Scaling Spark (Vancouver Sp...Beyond Shuffling  - Effective Tips and Tricks for Scaling Spark (Vancouver Sp...
Beyond Shuffling - Effective Tips and Tricks for Scaling Spark (Vancouver Sp...
 
Introduction to and Extending Spark ML
Introduction to and Extending Spark MLIntroduction to and Extending Spark ML
Introduction to and Extending Spark ML
 
Spark with Elasticsearch
Spark with ElasticsearchSpark with Elasticsearch
Spark with Elasticsearch
 
Java Performance Tips (So Code Camp San Diego 2014)
Java Performance Tips (So Code Camp San Diego 2014)Java Performance Tips (So Code Camp San Diego 2014)
Java Performance Tips (So Code Camp San Diego 2014)
 
Programming the Semantic Web
Programming the Semantic WebProgramming the Semantic Web
Programming the Semantic Web
 
Search Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrSearch Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and Solr
 
Graph Databases & OrientDB
Graph Databases & OrientDBGraph Databases & OrientDB
Graph Databases & OrientDB
 
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
 

Viewers also liked

201301 - Focus Neo4j
201301 - Focus Neo4j201301 - Focus Neo4j
201301 - Focus Neo4j
lyonjug
 
Keanu reeves
Keanu reevesKeanu reeves
Keanu reeves
ipenam
 
CS158: Final Project
CS158: Final ProjectCS158: Final Project
CS158: Final Project
Evan Casey
 
Graph Adoption at Gamesys - Toby O'Rourke @ GraphConnect SF 2013
Graph Adoption at Gamesys - Toby O'Rourke @ GraphConnect SF 2013Graph Adoption at Gamesys - Toby O'Rourke @ GraphConnect SF 2013
Graph Adoption at Gamesys - Toby O'Rourke @ GraphConnect SF 2013
Neo4j
 
Music Information Retrieval: Overview and Current Trends 2008
Music Information Retrieval: Overview and Current Trends 2008Music Information Retrieval: Overview and Current Trends 2008
Music Information Retrieval: Overview and Current Trends 2008
Rui Pedro Paiva
 

Viewers also liked (20)

[FRENCH] - Neo4j and Cypher - Remi Delhaye
[FRENCH] - Neo4j and Cypher - Remi Delhaye[FRENCH] - Neo4j and Cypher - Remi Delhaye
[FRENCH] - Neo4j and Cypher - Remi Delhaye
 
201301 - Focus Neo4j
201301 - Focus Neo4j201301 - Focus Neo4j
201301 - Focus Neo4j
 
Keanu reeves
Keanu reevesKeanu reeves
Keanu reeves
 
How To Do Music Recommendation
How To Do Music RecommendationHow To Do Music Recommendation
How To Do Music Recommendation
 
CS158: Final Project
CS158: Final ProjectCS158: Final Project
CS158: Final Project
 
Query By humming - Music retrieval technology
Query By humming - Music retrieval technologyQuery By humming - Music retrieval technology
Query By humming - Music retrieval technology
 
Graph Adoption at Gamesys - Toby O'Rourke @ GraphConnect SF 2013
Graph Adoption at Gamesys - Toby O'Rourke @ GraphConnect SF 2013Graph Adoption at Gamesys - Toby O'Rourke @ GraphConnect SF 2013
Graph Adoption at Gamesys - Toby O'Rourke @ GraphConnect SF 2013
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4j
 
Ok shazam, "la la-lalaa"!
Ok shazam, "la la-lalaa"!Ok shazam, "la la-lalaa"!
Ok shazam, "la la-lalaa"!
 
Introduction to the graph technologies landscape
Introduction to the graph technologies landscapeIntroduction to the graph technologies landscape
Introduction to the graph technologies landscape
 
Introducing Neo4j 3.0
Introducing Neo4j 3.0Introducing Neo4j 3.0
Introducing Neo4j 3.0
 
Graphs for Enterprise Architects
Graphs for Enterprise ArchitectsGraphs for Enterprise Architects
Graphs for Enterprise Architects
 
Neo4j in Depth
Neo4j in DepthNeo4j in Depth
Neo4j in Depth
 
Music Information Retrieval: Overview and Current Trends 2008
Music Information Retrieval: Overview and Current Trends 2008Music Information Retrieval: Overview and Current Trends 2008
Music Information Retrieval: Overview and Current Trends 2008
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
 
Intro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
Intro to Graph Databases Using Tinkerpop, TitanDB, and GremlinIntro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
Intro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
 
Intro to Neo4j presentation
Intro to Neo4j presentationIntro to Neo4j presentation
Intro to Neo4j presentation
 
Working With a Real-World Dataset in Neo4j: Import and Modeling
Working With a Real-World Dataset in Neo4j: Import and ModelingWorking With a Real-World Dataset in Neo4j: Import and Modeling
Working With a Real-World Dataset in Neo4j: Import and Modeling
 
Base de données graphe et Neo4j
Base de données graphe et Neo4jBase de données graphe et Neo4j
Base de données graphe et Neo4j
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 

Similar to Neo4j: Graph-like power

Brett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4jBrett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine
 
Neo4j - Graph Database
Neo4j - Graph DatabaseNeo4j - Graph Database
Neo4j - Graph Database
Mubashar Iqbal
 
OQGraph at MySQL Users Conference 2011
OQGraph at MySQL Users Conference 2011OQGraph at MySQL Users Conference 2011
OQGraph at MySQL Users Conference 2011
Antony T Curtis
 

Similar to Neo4j: Graph-like power (20)

Neo4j graph database
Neo4j graph databaseNeo4j graph database
Neo4j graph database
 
Brett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4jBrett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4j
 
Graph databases
Graph databasesGraph databases
Graph databases
 
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
 
OQGraph @ SCaLE 11x 2013
OQGraph @ SCaLE 11x 2013OQGraph @ SCaLE 11x 2013
OQGraph @ SCaLE 11x 2013
 
NoSQL no more: SQL on Druid with Apache Calcite
NoSQL no more: SQL on Druid with Apache CalciteNoSQL no more: SQL on Druid with Apache Calcite
NoSQL no more: SQL on Druid with Apache Calcite
 
Spark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest CórdobaSpark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest Córdoba
 
Big Data processing with Apache Spark
Big Data processing with Apache SparkBig Data processing with Apache Spark
Big Data processing with Apache Spark
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large Graphs
 
Getting started with Graph Databases & Neo4j
Getting started with Graph Databases & Neo4jGetting started with Graph Databases & Neo4j
Getting started with Graph Databases & Neo4j
 
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
 
Neo4j - Graph Database
Neo4j - Graph DatabaseNeo4j - Graph Database
Neo4j - Graph Database
 
OQGraph at MySQL Users Conference 2011
OQGraph at MySQL Users Conference 2011OQGraph at MySQL Users Conference 2011
OQGraph at MySQL Users Conference 2011
 
Distributed computing with spark
Distributed computing with sparkDistributed computing with spark
Distributed computing with spark
 
RMLL 2013 - Synchronize OpenLDAP and Active Directory with LSC
RMLL 2013 - Synchronize OpenLDAP and Active Directory with LSCRMLL 2013 - Synchronize OpenLDAP and Active Directory with LSC
RMLL 2013 - Synchronize OpenLDAP and Active Directory with LSC
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
managing big data
managing big datamanaging big data
managing big data
 
Graph Analytics with ArangoDB
Graph Analytics with ArangoDBGraph Analytics with ArangoDB
Graph Analytics with ArangoDB
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with Go
 
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
Anatomy of Data Frame API :  A deep dive into Spark Data Frame APIAnatomy of Data Frame API :  A deep dive into Spark Data Frame API
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
 

Recently uploaded

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 

Recently uploaded (20)

WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistan
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 

Neo4j: Graph-like power

  • 1.
  • 2. Graph-like power Roman R. MATCH (a:Actor),(m:Movie) WHERE a.name ='Keanu Reeves' AND m.title='The Matrix' CREATE (actor)-[:ACTS_IN]->(movie)
  • 3. Today ○ Graphs in NoSQL world ○ classification ○ definition ○ components ○ Neo4j ○ nodes, rels, props, indexes ○ Cypher ○ PHP and Neo4j ○ Demo ○ Alternatives ○ Q/A 1
  • 5. What is a Graph in math 3 ● represent a connected set of objects ● graph: ○ vertex (node/points) ○ edge (arc/line/relationship/arrow) - undirected ○ attribute (property) - on node/relationship ● types: ○ pair: G = (V, E) ○ digraph: D = (V, A) ○ mixed: G = (V, E, A) V = {1, 2, 3, 4, 5, 6} E = {{1, 2}, {1, 5}, {2, 3}, {2, 5}, {3, 4}, {4, 5}, {4, 6}}
  • 6. What is a Graph database 4 ● stores data in a graph and retrieving vast networks of data ● shines when storing richly-connected data ● consists of nodes, connected by relationships ○ A Graph —records data in→ Nodes —which have→ Properties ○ Nodes —are organized by→ Rels —which also have→ Properties ○ Nodes —are grouped by→ Labels —into→ Sets ○ A Traversal —navigates→ a Graph it —identifies→ Paths —which order→ Nodes ○ An Index —maps from→ Properties —to either→ Nodes or Rels ○ A Graph Database —manages a→ Graph and —also manages related→ Indexes
  • 7. Nodes, Rels, Props, Labels 5 A Graph —records data in→ Nodes —which have→ Properties Nodes —are organized by→ Relationships —which also have→ Properties Nodes —are grouped by→ Labels —into→ Sets
  • 8. Graph Traversal 6 A Traversal —navigates→ a Graph it —identifies→ Paths —which order→ Nodes what music do my friends like that I don’t yet own if this power supply goes down, what web services are affected?
  • 9. Graph Index 7 An Index —maps from→ Properties —to either→ Nodes or Rels find the Account for username master-of-graphs
  • 10. Graph 8 A Graph Database —manages a→ Graph and —also manages related→ Indexes
  • 11. How looks Graph database 9
  • 12. A Graph Database transforms a RDBMS 10
  • 13. A Graph Database elaborates a Key-Value Store 11 K* = key V* = value
  • 14. A Graph Database relates Column-Family 12 ● BigTable databases are an evolution of key-value, using "families" to allow grouping of rows ● stored in a graph, the families could become hierarchical, and the relationships among data becomes explicit
  • 15. A Graph Database navigates a Document Store 13 D=Document, S=Subdocument, V=Value, D2/S2 = reference
  • 16. NoSQL Data Models 14 90% of all use cases Relational Databases
  • 17. 15
  • 18. ● intuitive, using a graph model for data representation ● reliable, fully transactional, upholds ACID ● durable and fast, using a custom disk-based, native storage engine ● massively scalable, up to several billion nodes/relationships/properties ● highly-available, when distributed across multiple machines ● expressive, with a powerful, human readable declarative graph query language ● fast, with a powerful traversal framework for high-speed graph queries ● embeddable, with a few small jars ● simple, accesible by a convenient REST API interface or an object- oriented JAVA API ● indexes are based on Apache Lucene, supports Secondary Indexes ● has been in commercial development for 10 years and in production for over 7 years; since 2003; ● Cross-platform; Simple set-up; Well documented; Open source; ● GPL for Community, AGPL for Enterprise 16 Neo4j features
  • 19. ● CPU - Intel Core i3/i7 ● Memory - 2GB .. 16/32GB ● Disk - 10GB SATA .. SSD w/ SATA ● Filesystem - ext4 .. ext4/ZFS ● Software - Oracle JAVA 7 17 Neo4j requirements
  • 20. ● Neo4j Community ○ Open-Source High Performance ○ fully ACID transactional graph database ● Neo4j Enterprise ○ High-Performance Cache (up to 10x faster) ○ Horizontal scalability with Neo4j Clustering (predictable scalability) ○ High-availability and online backups ○ Cache based sharding (shard your graph in memory) ○ Advanced Monitoring (operational metrics) ○ Certified for Windows and Linux ○ Email/Phone Support (10x5, 24x7 hours) ○ Subscriptions ■ Personal (up to 3 devs, $100k annual revenue) = FREE ■ Startups (<$10M funding, <$5M annual revenue) = $12k ■ Business (medium, to Global 2000) = Contact Sales 18 Neo4j license
  • 21. 19 ● for the simple friends of friends query, Neo4j is 60% faster than MySQL ● for friends of friends of friends, Neo is 180 times faster ● and for the depth four query, Neo4j is 1,135 times faster ● and MySQL just chokes on the depth 5 query Neo4j vs. Mysql
  • 22. Neo4j: Nodes ● fundamental units that form a graph ● can have key/value-style properties ● index nodes and relationships by {key, value} pairs ● represent entities 20
  • 23. Neo4j: Relationships #1/2 ● connect entities and structure domain ● allow for finding related data ● are always directed (outgoing or incoming) ● are equally well traversed in either direction ● can have relationships to itself ● have a relationship type (label) 21
  • 25. Neo4j: Properties ● nodes and relationships can have properties ● are key-value pairs ○ key is a string ○ values can be either a primitive or an array of one primitive type ■ boolean, String, int, int[], etc ■ Java Language Specification ● entity attributes, rels qualities, and metadata 23
  • 26. Neo4j: Labels ● used to group nodes into sets ● any number of labels, including none ● can be added and removed during runtime ● can be used to mark temporary states for nodes ● names case-sensitive ● CamelCase (convention) 24
  • 27. Neo4j: Paths ● is one or more nodes with connecting relationships ● shortest path: ● a path of length one: ● a path of length one: 25
  • 28. Neo4j: Traversal ● Traversal Framework from box ● means visiting nodes, following relationships by rules ● in most cases only a subgraph is visited ● callback based traversal API ○ you can specify the traversal rules ● traversing breadth- or depth-first ● open Java API 26
  • 29. Neo4j: graph algorithms ● A* (> uses the A* algorithm to find the cheapest path between two nodes) ● Dijkstra (dijkstra > Dijkstra algorithm to find the cheapest path between two nodes) ● PathWithLength (> all paths of a certain length (depth) between two nodes) ● Shortest paths (shortestPath Default > find all the shortest paths between two nodes) ● All simple paths (allSimplePaths > find all simple paths between two nodes; without loops;) ● All paths (allPaths > find all available paths between two nodes) 27
  • 30. Neo4j: Schema ● is schema-optional graph database 28
  • 31. ● introduced in Neo4j 2.0 ● eventually available (populating in the background, is not immediately available for querying) ○ come online after fully populated ○ failed status (drop and recreate the index) ● can be created on labels group ● indexed Nodes & Rels ● node_auto_indexing=false, node_keys_indexable Neo4j: Index 29
  • 32. Neo4j: Constraints ● can help you keep your data clean ● specify the rules for what your data should look like ● unique constraints is the only available constraint type 30
  • 33. ● single server instance ○ nodes = 2^35 (~34 billion) ○ relationships = 2^35 (~34 billion) ○ labels = 2^31 (~2 billion) ○ properties = 2^36 to 2^38 depending on property types (maximum ~274 billion, always at least ~68 billion) ○ relationship types = 2^15 (~ 32’000) 31 Neo4j: Data Size
  • 34. ● powerful graph query language ● relatively simple ● declarative grammar (say what you want, not how) ● humane query language ● self-explanatory (based on English prose and neat iconography) ● written in Scala ● pattern-matching (borrows expression approaches from SPARQL) ● aggregation, ordering, limits ● create, update, delete ● structure and most of keywords inspired by SQL ● changing rather rapidly (CYPHER 1.9 START ...) Cypher Query Language 32 “Makes the simple things easy, and the complex things possible”
  • 35. Cypher patterns #1/2 33 ● (a) ● (b) ● (a)-->(b) ● (a)-->(b)-->(c) ● (b)-->(c)<--(a) ● (b)-->()<--(a) ● (a)--(b) ● (a)-(*5)->(b) ● (a)-(*3..5)->(b) ○ (a)-(*3..)->(b) ○ (a)-(*..5)->(b) ○ (a)-(*)->(b)
  • 36. Cypher patterns #2/2 34 ● (a:Label)-->(m) ● (a:User:Admin)-->(m) ● (a)--(m) ● (a)-[r]->(m) ● (a)-[ACTED_IN]->(m) ● (a)-[r:SOME|ELSE|WTH]->(m)
  • 37. Cypher: START / RETURN “It all starts with the START” Michael Hunger, Cypher webinar, Sep 2012 ● designates the start points ● START is optional (in Neo4j >= 2.0) Examples: ● START <lookup> RETURN <expression> ● START n=node(0) RETURN n ● START n=node(*) RETURN n.name 35
  • 38. Cypher: MATCH ● primary way of getting data from the database ● START <lookup> MATCH <pattern> RETURN <expr> ● OPTIONAL MATCH <lookup> RETURN <expr> Examples: ● MATCH (n) RETURN count(n) ● MATCH (actor:Actor) RETURN actor.name; ● START me=node(0) MATCH (me)--(f) RETURN f.name ● MATCH (n)-[r]->(m) RETURN n AS FROM, r AS `->`, m AS TO 36
  • 39. ● creates nodes and relationships ● CREATE (<name>[:label] [properties,..]) ● CREATE (<node-in>)-[<var>:RELATION [properties,..]]->(<node-out>); ● CREATE UNIQUE ... Examples: ● CREATE (n:Actor { name:"Keanu Reeves" }); ● CREATE (keanu)-[:ACTED_IN]->(matrix) ● MATCH (keanu {name:”..”}) SET keanu.age=49 RETURN Cypher: CREATE / SET 37
  • 40. Cypher: WHERE ● filters the results ● MATCH <pattern> WHERE <condition> RETURN <expr> Examples: ● WHERE n.name =~ “(?i)John.*” ● WHERE NOT .. ● WHERE type(rel) =~ “Perso.*” 38
  • 41. Cypher: RETURN ● creates the result table ● any query can return data ● can be nodes, relationships, or properties on these ● RETURN DISTINCT <expression> AS x ● RETURN aggregate(expr) as alias ● RETURN nodes, rels, properties ● RETURN expressions of funcs and operators ● RETURN aggregation funcs on the above 39
  • 42. Cypher: etc ● CASE / WHEN / ELSE ● ORDER BY node.key, node2.key, .. ASC|DESC ● LIMIT / SKIP ● WITH (WITH count(*) as c) ● UNION / UNION ALL (combining results from multiple queries) ● USING INDEX/SCAN ● MERGE / SET / DELETE / REMOVE / FORECH ● Expressions ● Operators ● Comments ● Functions: ALL, ANY, LENGTH, {Math}, {String}, ... 40
  • 43. ● any updating query will run in a transaction ● ACID ● “it is very important to finish each transaction” ● write lock on node/rel: ○ adding, changing or removing prop on a node/rel ● write lock on node: ○ creating or deleting a node ● write lock on node and both its nodes: ○ creating or deleting a relationship Cypher: Transactions 41
  • 44. Cypher: Aggregation ● count(node/rel/prop) ● count(n), count(n.prop) ● sum(n.prop) ● avg(n.prop) ● percentileDisc(n.prop, {median}) ● stdev(n.prop, {median}) - calculate deviation from group ● max(n.prop, {median}) ● collect(n.prop, {median}) ● RETURN n, count(*) 42
  • 45. ● SELECT * FROM Person WHERE name=“Valentin” and age > 30 ● START person=node:Person(node=”Valentin”) WHERE person.age > 30 RETURN person Cypher: back to SQL #1/5 43
  • 46. Cypher: back to SQL #2/5 ● SELECT “Email”.* FROM Person JOIN “Email” ON “Person”.id = “Email”.person_id WHERE “Person”.name = “Benedikt” ● START person=node:Person(name=”Benedikt”) MATCH person-[:email]->email RETURN email 44
  • 47. Cypher: back to SQL #3/5 ● show me all people that are both actors and directors ● SELECT name FROM Person WHERE person_id IN (SELECT person_id FROM Actor) AND person_id IN (SELECT person_id FROM Director) ● START person=node:Person(“name:*”) WHERE (person)-[:ACTS_IN]->() AND (person)-[:DIRECTED]->() RETURN person.name 45
  • 48. Cypher: back to SQL #4/5 ● show me all Tom Hanks’s co-actors ● SELECT DISTICT co_actor.name FROM Person tom JOIN Movie a1 ON tom.person_in = a1.person_id JOIN Actor a2 ON a1.movie_id = a2.movie_id JOIN Person co_actor ON co_actor.person_id = a2.person_id WHERE tom.name = “Tom Hanks” ● START tom=node:Person(name=”Tom Hanks”) MATCH tom-[:ACTS_IN]->movie, co_actor-[:ACTS_IN]->movie RETURN DISTINCT co_actor.name 46
  • 49. Cypher: back to SQL #5/5 ● show me all Lucy’s favorite directors ● SELECT dir.name, count(*) FROM Person lucy JOIN Actor on Person.person_id = Actor.person_id JOIN Director ON Actor.movie_id = Director.movie_id JOIN Person dir ON Director.person_id = dir.person_id WHERE lucy.name = “Lucy Liu” GROUP BY dir.name ORDER BY count(*) DESC ● START lucy=node:Person(name=”Lucy Liu”) MATCH lucy-[:ACTS_IN]->movie, director-[:DIRECTED]->movie RETURN director.name, count(*) ORDER BY director.name, count(*) DESC 47
  • 50. START lucy = node:Person(name=”Lucy Lui”), kevin = node:Person(name=”Kevin Bacon”) MATCH p = shortestPath( lucy-[:ACTS_IN*]-kevin ) RETURN EXTRACT (n in NODES(p): COALESCE(n.name?, n.title?)) 48 Cypher: back to SQL #6/5
  • 51. Neo4j Shell ● command-line shell for running Cypher queries ● supports remote shell ● :schema ● bash# neo4j-shell -path data/graph.db -readonly -config conf/neo4j.properties -c “<command>” 49
  • 52. Neo4j: Security ● does not deal with data encryption explicitly ● can be used all means built into the Java ● can be used encrypted datastore ● webadmin https 50
  • 53. ● manipulate data stored in RDF format ● focused on match triple sets PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?email WHERE { ?person a foaf:Person. ?person foaf:name ?name. ?person foaf:mbox ?email. } SPARQL 51
  • 54. ● graph traversal language ● scripting language ● Pipe & Filter (similar to jQuery) ● across different graph databases ● based on Groovy (limited to Java) ● not as stable in Neo4j ● XPath like ● ./outE[label=”family”]/inV/@name ● g.v(1).out('likes').in('likes').out('likes').groupCount(m) ● g.V.as('x').out.groupCount(m).loop('x'){c++ < 1000} ● g.v(1).in(‘LOVE_OF’).out(‘SOME_IN’).has(‘title’,’abc’).back(2) Gremlin 52
  • 55. Neo4j and PHP ● everyman/neo4jphp < packagist.org ○ PHP wrapper for the Neo4j using REST interface ○ Follows the PSR-0 autoloading standard ○ Basic wrappers for all components ○ Last update - a month ago ○ supports Gremlin ● Neo4j-PHP OGM < a lot of based on ○ Object Graph Mapper, inspired by Doctrine ○ based on DoctrineCommon ○ borrows significantly DoctrineORM design ○ uses annotations on classes ○ MIT Licence ● Neo4J PHP REST API client ○ Using Neo4j REST API ○ Node create/find/delete ○ Relationship create/list/filter 53
  • 56. High Availability with Neo4j ● in HA - a single master and zero or more slaves ● slave synchronizing with the master to preserve consistency ● master write to slave before transaction completes 54
  • 57. Demo Neo4j.org Example Datasets: ● DrWho (nodes=1'060; rels=2'286) ● Cineasts Movies & Actors (nodes=64'069; rels=121'778) ● Hubway Data Challenge (nodes=554'674; rels=2'011'904) GraphGist: ● JIRA and neo4j ● PHP and neo4j ● Kant in neo4j XSS 55
  • 58. Gephi (win, nix, mac) 56
  • 63. Neovigator (neography + processing.js) 61
  • 64. ● Heroku ○ GrapheneDB beta ○ bash$ heroku addons:add graphenedb ● Jelastic Cloud PaaS Cloud 62
  • 65. ● GrapheneDB - based on neo4j ● AllegroGraph - Closed Source, Commercial, RDF-QuadStore ● Sones - Closed Source, .NET focused ○ graph database built around the W3C spec for the Resource Description Framework ○ supports SPARQL, RDFS++, and Prolog ● Virtuoso - Closed Source, RDF focused ● GraphDB - graph database built in .NET by the German company sones ● InfiniteGraph - goal is to create a graph database with "virtually unlimited scalability." ● FlockDB Analogues 63
  • 66. Docs ● http://docs.neo4j.org/chunked/snapshot/ ● http://docs.neo4j.org/refcard/2.0/ ● http://graphdatabases.com/ - book, O'REILLY ● http://www.cs.usfca. edu/~galles/visualization/Algorithms.html - Graph Algorithms visualization ● http://bit.ly/rr-neo4j ● https://github.com/itspoma/test-neo4j 64
  • 67. ● best used for graph-style, rich or complex, structured dense data, deep graphs with unlimited depth and cyclical, with weighted connections, interconnected data ● quickly add new functionality without impacting existing deployments ● schema-less forcing to re-think entire approach to data ● not the silver bullet for all problems Conclusion