Neo4j: Graph-like power
Upcoming SlideShare
Loading in...5
×
 

Neo4j: Graph-like power

on

  • 761 views

Graph Databases in NoSQL world. Neo4j and Cypher.

Graph Databases in NoSQL world. Neo4j and Cypher.

Statistics

Views

Total Views
761
Views on SlideShare
756
Embed Views
5

Actions

Likes
5
Downloads
57
Comments
0

3 Embeds 5

http://www.slideee.com 3
http://www.linkedin.com 1
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Neo4j: Graph-like power Neo4j: Graph-like power Presentation Transcript

  • Graph-like power Roman R. MATCH (a:Actor),(m:Movie) WHERE a.name ='Keanu Reeves' AND m.title='The Matrix' CREATE (actor)-[:ACTS_IN]->(movie)
  • Today ○ Graphs in NoSQL world ○ classification ○ definition ○ components ○ Neo4j ○ nodes, rels, props, indexes ○ Cypher ○ PHP and Neo4j ○ Demo ○ Alternatives ○ Q/A 1
  • NoSQL Databases Key-Value Document Graph Column (BigTable ) MemcacheDB Redis Riak Cassandra CouchDB Neo4j TITAN HBase/Hadoop OrientDB 2 Elasticsearch RavenDB Tokyo Cabinet Infinite GraphAllegroGraph NoSQL MongoDB
  • What is a Graph in math 3 ● represent a connected set of objects ● graph: ○ vertex (node/points) ○ edge (arc/line/relationship/arrow) - undirected ○ attribute (property) - on node/relationship ● types: ○ pair: G = (V, E) ○ digraph: D = (V, A) ○ mixed: G = (V, E, A) V = {1, 2, 3, 4, 5, 6} E = {{1, 2}, {1, 5}, {2, 3}, {2, 5}, {3, 4}, {4, 5}, {4, 6}}
  • What is a Graph database 4 ● stores data in a graph and retrieving vast networks of data ● shines when storing richly-connected data ● consists of nodes, connected by relationships ○ A Graph —records data in→ Nodes —which have→ Properties ○ Nodes —are organized by→ Rels —which also have→ Properties ○ Nodes —are grouped by→ Labels —into→ Sets ○ A Traversal —navigates→ a Graph it —identifies→ Paths —which order→ Nodes ○ An Index —maps from→ Properties —to either→ Nodes or Rels ○ A Graph Database —manages a→ Graph and —also manages related→ Indexes
  • Nodes, Rels, Props, Labels 5 A Graph —records data in→ Nodes —which have→ Properties Nodes —are organized by→ Relationships —which also have→ Properties Nodes —are grouped by→ Labels —into→ Sets
  • Graph Traversal 6 A Traversal —navigates→ a Graph it —identifies→ Paths —which order→ Nodes what music do my friends like that I don’t yet own if this power supply goes down, what web services are affected?
  • Graph Index 7 An Index —maps from→ Properties —to either→ Nodes or Rels find the Account for username master-of-graphs
  • Graph 8 A Graph Database —manages a→ Graph and —also manages related→ Indexes
  • How looks Graph database 9
  • A Graph Database transforms a RDBMS 10
  • A Graph Database elaborates a Key-Value Store 11 K* = key V* = value
  • A Graph Database relates Column-Family 12 ● BigTable databases are an evolution of key-value, using "families" to allow grouping of rows ● stored in a graph, the families could become hierarchical, and the relationships among data becomes explicit
  • A Graph Database navigates a Document Store 13 D=Document, S=Subdocument, V=Value, D2/S2 = reference
  • NoSQL Data Models 14 90% of all use cases Relational Databases
  • 15
  • ● intuitive, using a graph model for data representation ● reliable, fully transactional, upholds ACID ● durable and fast, using a custom disk-based, native storage engine ● massively scalable, up to several billion nodes/relationships/properties ● highly-available, when distributed across multiple machines ● expressive, with a powerful, human readable declarative graph query language ● fast, with a powerful traversal framework for high-speed graph queries ● embeddable, with a few small jars ● simple, accesible by a convenient REST API interface or an object- oriented JAVA API ● indexes are based on Apache Lucene, supports Secondary Indexes ● has been in commercial development for 10 years and in production for over 7 years; since 2003; ● Cross-platform; Simple set-up; Well documented; Open source; ● GPL for Community, AGPL for Enterprise 16 Neo4j features
  • ● CPU - Intel Core i3/i7 ● Memory - 2GB .. 16/32GB ● Disk - 10GB SATA .. SSD w/ SATA ● Filesystem - ext4 .. ext4/ZFS ● Software - Oracle JAVA 7 17 Neo4j requirements
  • ● Neo4j Community ○ Open-Source High Performance ○ fully ACID transactional graph database ● Neo4j Enterprise ○ High-Performance Cache (up to 10x faster) ○ Horizontal scalability with Neo4j Clustering (predictable scalability) ○ High-availability and online backups ○ Cache based sharding (shard your graph in memory) ○ Advanced Monitoring (operational metrics) ○ Certified for Windows and Linux ○ Email/Phone Support (10x5, 24x7 hours) ○ Subscriptions ■ Personal (up to 3 devs, $100k annual revenue) = FREE ■ Startups (<$10M funding, <$5M annual revenue) = $12k ■ Business (medium, to Global 2000) = Contact Sales 18 Neo4j license
  • 19 ● for the simple friends of friends query, Neo4j is 60% faster than MySQL ● for friends of friends of friends, Neo is 180 times faster ● and for the depth four query, Neo4j is 1,135 times faster ● and MySQL just chokes on the depth 5 query Neo4j vs. Mysql
  • Neo4j: Nodes ● fundamental units that form a graph ● can have key/value-style properties ● index nodes and relationships by {key, value} pairs ● represent entities 20
  • Neo4j: Relationships #1/2 ● connect entities and structure domain ● allow for finding related data ● are always directed (outgoing or incoming) ● are equally well traversed in either direction ● can have relationships to itself ● have a relationship type (label) 21
  • Neo4j: Relationships #2/2 22
  • Neo4j: Properties ● nodes and relationships can have properties ● are key-value pairs ○ key is a string ○ values can be either a primitive or an array of one primitive type ■ boolean, String, int, int[], etc ■ Java Language Specification ● entity attributes, rels qualities, and metadata 23
  • Neo4j: Labels ● used to group nodes into sets ● any number of labels, including none ● can be added and removed during runtime ● can be used to mark temporary states for nodes ● names case-sensitive ● CamelCase (convention) 24
  • Neo4j: Paths ● is one or more nodes with connecting relationships ● shortest path: ● a path of length one: ● a path of length one: 25
  • Neo4j: Traversal ● Traversal Framework from box ● means visiting nodes, following relationships by rules ● in most cases only a subgraph is visited ● callback based traversal API ○ you can specify the traversal rules ● traversing breadth- or depth-first ● open Java API 26
  • Neo4j: graph algorithms ● A* (> uses the A* algorithm to find the cheapest path between two nodes) ● Dijkstra (dijkstra > Dijkstra algorithm to find the cheapest path between two nodes) ● PathWithLength (> all paths of a certain length (depth) between two nodes) ● Shortest paths (shortestPath Default > find all the shortest paths between two nodes) ● All simple paths (allSimplePaths > find all simple paths between two nodes; without loops;) ● All paths (allPaths > find all available paths between two nodes) 27
  • Neo4j: Schema ● is schema-optional graph database 28
  • ● introduced in Neo4j 2.0 ● eventually available (populating in the background, is not immediately available for querying) ○ come online after fully populated ○ failed status (drop and recreate the index) ● can be created on labels group ● indexed Nodes & Rels ● node_auto_indexing=false, node_keys_indexable Neo4j: Index 29
  • Neo4j: Constraints ● can help you keep your data clean ● specify the rules for what your data should look like ● unique constraints is the only available constraint type 30
  • ● single server instance ○ nodes = 2^35 (~34 billion) ○ relationships = 2^35 (~34 billion) ○ labels = 2^31 (~2 billion) ○ properties = 2^36 to 2^38 depending on property types (maximum ~274 billion, always at least ~68 billion) ○ relationship types = 2^15 (~ 32’000) 31 Neo4j: Data Size
  • ● powerful graph query language ● relatively simple ● declarative grammar (say what you want, not how) ● humane query language ● self-explanatory (based on English prose and neat iconography) ● written in Scala ● pattern-matching (borrows expression approaches from SPARQL) ● aggregation, ordering, limits ● create, update, delete ● structure and most of keywords inspired by SQL ● changing rather rapidly (CYPHER 1.9 START ...) Cypher Query Language 32 “Makes the simple things easy, and the complex things possible”
  • Cypher patterns #1/2 33 ● (a) ● (b) ● (a)-->(b) ● (a)-->(b)-->(c) ● (b)-->(c)<--(a) ● (b)-->()<--(a) ● (a)--(b) ● (a)-(*5)->(b) ● (a)-(*3..5)->(b) ○ (a)-(*3..)->(b) ○ (a)-(*..5)->(b) ○ (a)-(*)->(b)
  • Cypher patterns #2/2 34 ● (a:Label)-->(m) ● (a:User:Admin)-->(m) ● (a)--(m) ● (a)-[r]->(m) ● (a)-[ACTED_IN]->(m) ● (a)-[r:SOME|ELSE|WTH]->(m)
  • Cypher: START / RETURN “It all starts with the START” Michael Hunger, Cypher webinar, Sep 2012 ● designates the start points ● START is optional (in Neo4j >= 2.0) Examples: ● START <lookup> RETURN <expression> ● START n=node(0) RETURN n ● START n=node(*) RETURN n.name 35
  • Cypher: MATCH ● primary way of getting data from the database ● START <lookup> MATCH <pattern> RETURN <expr> ● OPTIONAL MATCH <lookup> RETURN <expr> Examples: ● MATCH (n) RETURN count(n) ● MATCH (actor:Actor) RETURN actor.name; ● START me=node(0) MATCH (me)--(f) RETURN f.name ● MATCH (n)-[r]->(m) RETURN n AS FROM, r AS `->`, m AS TO 36
  • ● creates nodes and relationships ● CREATE (<name>[:label] [properties,..]) ● CREATE (<node-in>)-[<var>:RELATION [properties,..]]->(<node-out>); ● CREATE UNIQUE ... Examples: ● CREATE (n:Actor { name:"Keanu Reeves" }); ● CREATE (keanu)-[:ACTED_IN]->(matrix) ● MATCH (keanu {name:”..”}) SET keanu.age=49 RETURN Cypher: CREATE / SET 37
  • Cypher: WHERE ● filters the results ● MATCH <pattern> WHERE <condition> RETURN <expr> Examples: ● WHERE n.name =~ “(?i)John.*” ● WHERE NOT .. ● WHERE type(rel) =~ “Perso.*” 38
  • Cypher: RETURN ● creates the result table ● any query can return data ● can be nodes, relationships, or properties on these ● RETURN DISTINCT <expression> AS x ● RETURN aggregate(expr) as alias ● RETURN nodes, rels, properties ● RETURN expressions of funcs and operators ● RETURN aggregation funcs on the above 39
  • Cypher: etc ● CASE / WHEN / ELSE ● ORDER BY node.key, node2.key, .. ASC|DESC ● LIMIT / SKIP ● WITH (WITH count(*) as c) ● UNION / UNION ALL (combining results from multiple queries) ● USING INDEX/SCAN ● MERGE / SET / DELETE / REMOVE / FORECH ● Expressions ● Operators ● Comments ● Functions: ALL, ANY, LENGTH, {Math}, {String}, ... 40
  • ● any updating query will run in a transaction ● ACID ● “it is very important to finish each transaction” ● write lock on node/rel: ○ adding, changing or removing prop on a node/rel ● write lock on node: ○ creating or deleting a node ● write lock on node and both its nodes: ○ creating or deleting a relationship Cypher: Transactions 41
  • Cypher: Aggregation ● count(node/rel/prop) ● count(n), count(n.prop) ● sum(n.prop) ● avg(n.prop) ● percentileDisc(n.prop, {median}) ● stdev(n.prop, {median}) - calculate deviation from group ● max(n.prop, {median}) ● collect(n.prop, {median}) ● RETURN n, count(*) 42
  • ● SELECT * FROM Person WHERE name=“Valentin” and age > 30 ● START person=node:Person(node=”Valentin”) WHERE person.age > 30 RETURN person Cypher: back to SQL #1/5 43
  • Cypher: back to SQL #2/5 ● SELECT “Email”.* FROM Person JOIN “Email” ON “Person”.id = “Email”.person_id WHERE “Person”.name = “Benedikt” ● START person=node:Person(name=”Benedikt”) MATCH person-[:email]->email RETURN email 44
  • Cypher: back to SQL #3/5 ● show me all people that are both actors and directors ● SELECT name FROM Person WHERE person_id IN (SELECT person_id FROM Actor) AND person_id IN (SELECT person_id FROM Director) ● START person=node:Person(“name:*”) WHERE (person)-[:ACTS_IN]->() AND (person)-[:DIRECTED]->() RETURN person.name 45
  • Cypher: back to SQL #4/5 ● show me all Tom Hanks’s co-actors ● SELECT DISTICT co_actor.name FROM Person tom JOIN Movie a1 ON tom.person_in = a1.person_id JOIN Actor a2 ON a1.movie_id = a2.movie_id JOIN Person co_actor ON co_actor.person_id = a2.person_id WHERE tom.name = “Tom Hanks” ● START tom=node:Person(name=”Tom Hanks”) MATCH tom-[:ACTS_IN]->movie, co_actor-[:ACTS_IN]->movie RETURN DISTINCT co_actor.name 46
  • Cypher: back to SQL #5/5 ● show me all Lucy’s favorite directors ● SELECT dir.name, count(*) FROM Person lucy JOIN Actor on Person.person_id = Actor.person_id JOIN Director ON Actor.movie_id = Director.movie_id JOIN Person dir ON Director.person_id = dir.person_id WHERE lucy.name = “Lucy Liu” GROUP BY dir.name ORDER BY count(*) DESC ● START lucy=node:Person(name=”Lucy Liu”) MATCH lucy-[:ACTS_IN]->movie, director-[:DIRECTED]->movie RETURN director.name, count(*) ORDER BY director.name, count(*) DESC 47
  • START lucy = node:Person(name=”Lucy Lui”), kevin = node:Person(name=”Kevin Bacon”) MATCH p = shortestPath( lucy-[:ACTS_IN*]-kevin ) RETURN EXTRACT (n in NODES(p): COALESCE(n.name?, n.title?)) 48 Cypher: back to SQL #6/5
  • Neo4j Shell ● command-line shell for running Cypher queries ● supports remote shell ● :schema ● bash# neo4j-shell -path data/graph.db -readonly -config conf/neo4j.properties -c “<command>” 49
  • Neo4j: Security ● does not deal with data encryption explicitly ● can be used all means built into the Java ● can be used encrypted datastore ● webadmin https 50
  • ● manipulate data stored in RDF format ● focused on match triple sets PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?email WHERE { ?person a foaf:Person. ?person foaf:name ?name. ?person foaf:mbox ?email. } SPARQL 51
  • ● graph traversal language ● scripting language ● Pipe & Filter (similar to jQuery) ● across different graph databases ● based on Groovy (limited to Java) ● not as stable in Neo4j ● XPath like ● ./outE[label=”family”]/inV/@name ● g.v(1).out('likes').in('likes').out('likes').groupCount(m) ● g.V.as('x').out.groupCount(m).loop('x'){c++ < 1000} ● g.v(1).in(‘LOVE_OF’).out(‘SOME_IN’).has(‘title’,’abc’).back(2) Gremlin 52
  • Neo4j and PHP ● everyman/neo4jphp < packagist.org ○ PHP wrapper for the Neo4j using REST interface ○ Follows the PSR-0 autoloading standard ○ Basic wrappers for all components ○ Last update - a month ago ○ supports Gremlin ● Neo4j-PHP OGM < a lot of based on ○ Object Graph Mapper, inspired by Doctrine ○ based on DoctrineCommon ○ borrows significantly DoctrineORM design ○ uses annotations on classes ○ MIT Licence ● Neo4J PHP REST API client ○ Using Neo4j REST API ○ Node create/find/delete ○ Relationship create/list/filter 53
  • High Availability with Neo4j ● in HA - a single master and zero or more slaves ● slave synchronizing with the master to preserve consistency ● master write to slave before transaction completes 54
  • Demo Neo4j.org Example Datasets: ● DrWho (nodes=1'060; rels=2'286) ● Cineasts Movies & Actors (nodes=64'069; rels=121'778) ● Hubway Data Challenge (nodes=554'674; rels=2'011'904) GraphGist: ● JIRA and neo4j ● PHP and neo4j ● Kant in neo4j XSS 55
  • Gephi (win, nix, mac) 56
  • Linkurious.us 57
  • Neoclipse (eclipse plugin) 58
  • KeyLines (JavaScript library) 59
  • Graffeine (npm package) 60
  • Neovigator (neography + processing.js) 61
  • ● Heroku ○ GrapheneDB beta ○ bash$ heroku addons:add graphenedb ● Jelastic Cloud PaaS Cloud 62
  • ● GrapheneDB - based on neo4j ● AllegroGraph - Closed Source, Commercial, RDF-QuadStore ● Sones - Closed Source, .NET focused ○ graph database built around the W3C spec for the Resource Description Framework ○ supports SPARQL, RDFS++, and Prolog ● Virtuoso - Closed Source, RDF focused ● GraphDB - graph database built in .NET by the German company sones ● InfiniteGraph - goal is to create a graph database with "virtually unlimited scalability." ● FlockDB Analogues 63
  • Docs ● http://docs.neo4j.org/chunked/snapshot/ ● http://docs.neo4j.org/refcard/2.0/ ● http://graphdatabases.com/ - book, O'REILLY ● http://www.cs.usfca. edu/~galles/visualization/Algorithms.html - Graph Algorithms visualization ● http://bit.ly/rr-neo4j ● https://github.com/itspoma/test-neo4j 64
  • ● best used for graph-style, rich or complex, structured dense data, deep graphs with unlimited depth and cyclical, with weighted connections, interconnected data ● quickly add new functionality without impacting existing deployments ● schema-less forcing to re-think entire approach to data ● not the silver bullet for all problems Conclusion