2012 09 SF Data Mining zero to hero
Upcoming SlideShare
Loading in...5
×
 

2012 09 SF Data Mining zero to hero

on

  • 1,194 views

This is the presentation given by Michael Hunger and Peter Neubauer at the SF Data Mining group, see http://www.meetup.com/Data-Mining/events/80275492/

This is the presentation given by Michael Hunger and Peter Neubauer at the SF Data Mining group, see http://www.meetup.com/Data-Mining/events/80275492/

Statistics

Views

Total Views
1,194
Views on SlideShare
1,189
Embed Views
5

Actions

Likes
6
Downloads
56
Comments
0

3 Embeds 5

http://www.linkedin.com 3
https://twitter.com 1
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • There existed a number of different ways to query a graph database. This one aims to make querying easy, and to produce queries that are readable.\n\nWe looked at alternatives - SPARQL, SQL, Gremlin and other...\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Visualization done with GraphViz\nA user will have many such “stacks”\n
  • Search ranking weighs inbound and outbound node connections as part of search score calculation\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

2012 09 SF Data Mining zero to hero 2012 09 SF Data Mining zero to hero Presentation Transcript

  • Neo4j - from 0 to Her0 Thank you Trulia!Peter Neubauer Andreas Kollegger Michael Hunger@peterneubauer @akollegger @mesirii#neo4j 1
  • Neo4j - from 0 to Her0 Thank you Trulia!Peter Neubauer Andreas Kollegger Michael Hunger@peterneubauer @akollegger @mesirii#neo4j 1
  • Neo4j Neo4j - from 0 to Her0 Thank you Trulia!Peter Neubauer Andreas Kollegger Michael Hunger@peterneubauer @akollegger @mesirii#neo4j 1
  • 1
  • 2
  • Um, Neo for what? 2
  • Um, Neo for what?•Neo4j - the graph database•Graph•no: not for charts & diagrams, or vector artwork•yes: for storing data that is connected•remember linker lists, trees?•graphs are the general-purpose data structure 2
  • Neo4j is a Graph Database 3
  • Neo4j is a Graph Database๏ A Graph Database: 3
  • Neo4j is a Graph Database๏ A Graph Database: • a Property Graph with Nodes, Relationships and Properties on both 3
  • Neo4j is a Graph Database๏ A Graph Database: • a Property Graph with Nodes, Relationships and Properties on both • perfect for complex, highly connected data 3
  • Neo4j is a Graph Database๏ A Graph Database: • a Property Graph with Nodes, Relationships and Properties on both • perfect for complex, highly connected data๏ A Graph Database: 3
  • Neo4j is a Graph Database๏ A Graph Database: • a Property Graph with Nodes, Relationships and Properties on both • perfect for complex, highly connected data๏ A Graph Database: • reliable with real ACID Transactions 3
  • Neo4j is a Graph Database๏ A Graph Database: • a Property Graph with Nodes, Relationships and Properties on both • perfect for complex, highly connected data๏ A Graph Database: • reliable with real ACID Transactions • scalable: 32 Billion Nodes, 32 Billion Relationships, 64 Billion Properties 3
  • Neo4j is a Graph Database๏ A Graph Database: • a Property Graph with Nodes, Relationships and Properties on both • perfect for complex, highly connected data๏ A Graph Database: • reliable with real ACID Transactions • scalable: 32 Billion Nodes, 32 Billion Relationships, 64 Billion Properties • Server with REST API, or Embeddable on the JVM 3
  • Neo4j is a Graph Database๏ A Graph Database: • a Property Graph with Nodes, Relationships and Properties on both • perfect for complex, highly connected data๏ A Graph Database: • reliable with real ACID Transactions • scalable: 32 Billion Nodes, 32 Billion Relationships, 64 Billion Properties • Server with REST API, or Embeddable on the JVM • high-performance with High-Availability (read scaling) 3
  • Graph DB 101 4
  • 5
  • You know relational 5
  • You know relational 5
  • You know relational foo 5
  • You know relational foo bar 5
  • You know relational foo foo_bar bar 5
  • You know relational foo foo_bar bar 5
  • You know relational foo foo_bar bar 5
  • You know relational foo foo_bar bar 5
  • You know relationalnow consider relationships... 5
  • You know relationalnow consider relationships... 5
  • You know relationalnow consider relationships... 5
  • You know relationalnow consider relationships... 5
  • You know relationalnow consider relationships... 5
  • You know relationalnow consider relationships... 5
  • 5
  • 6
  • Were talking about aProperty Graph 6
  • Were talking about aProperty Graph Nodes 6
  • Were talking about aProperty Graph Nodes Relationships 6
  • Were talking about aProperty Graph Em Joh il a n knows knows Alli Tob Lar Nodes son ias knows s knows And And knows knows rea rés s knows knows knows Pet Miic Mc knows Ian er knows a a knows knows De Mic lia h ael Relationships Properties (each a key+value) + Indexes (for easy look-ups) 6
  • 6
  • 7
  • And, but, so how do youquery this "graph" database? 7
  • Cypher Query Language
  • What is Cypher?• Graph Query Language for Neo4j• Aims to make querying simple
  • Design Decisions Pattern matching
  • Design Decisions Pattern matching A B C
  • Design Decisions Pattern matching
  • Design Decisions Pattern matching
  • Design Decisions Pattern matching
  • Design Decisions Pattern matching
  • Design Decisions ASCII-art patterns
  • Design Decisions ASCII-art patterns () --> ()
  • Design Decisions Directed relationship A B
  • Design Decisions Directed relationship A B (A) --> (B)
  • Design Decisions Undirected relationship A B
  • Design Decisions Undirected relationship A B (A) -- (B)
  • Design Decisions specific relationships LOVES A B
  • Design Decisions specific relationships LOVES A B A -[:LOVES]-> B
  • Design Decisions Joined paths A B C
  • Design Decisions Joined paths A B C A --> B --> C
  • Design Decisions multiple paths A B C
  • Design Decisions multiple paths A B C A --> B --> C, A --> C
  • Design Decisions multiple paths A B C A --> B --> C, A --> C A --> B --> C <-- A
  • Design Decisions Optional relationships A B
  • Design Decisions Optional relationships A B A -[?]-> B
  • SELECT skills.*, user_skill.*FROM usersJOIN user_skill ON users.id = user_skill.user_idJOIN skills ON user_skill.skill_id = skill.id WHEREusers.id = 1
  • START user = node(1)MATCH user -[user_skill]-> skillRETURN skill, user_skill
  • IndexesUsed as multiple starting points, not to speedup any traversalsSTART a = node:nodes_index(type=User) MATCHa-[r:knows]-bRETURN ID(a), ID(b), r.weight
  • Variable length Path MatchSome UGLY recursive self join on the groupstableSTART max=node:person(name=“Max")MATCH group <-[:BELONGS_TO*]- maxRETURN group
  • 27
  • Who uses this stuff? 27
  • [A] Mmm Pancakes
  • [A] Mmm Pancakes
  • Cute meta + dataThis Material is subject to the terms of the Mozilla Public # License, v. 2.0. If a copy of theMPL was not distributed with this # file, You can obtain one at http://mozilla.org/MPL/2.0/
  • Cute meta + dataThis Material is subject to the terms of the Mozilla Public # License, v. 2.0. If a copy of theMPL was not distributed with this # file, You can obtain one at http://mozilla.org/MPL/2.0/
  • Neo4J Co-Existence • Node uuids as refs in external ElasticSearch also in internal Lucene • Custom search ranking for user history based on node relationship data • MySQL for user data, Redis for metricsThis Material is subject to the terms of the Mozilla Public # License, v. 2.0. If a copy of theMPL was not distributed with this # file, You can obtain one at http://mozilla.org/MPL/2.0/
  • [B] ACL from HellOne of the top 10 telcos worldwide
  • Example Access Authorization UserAccess may be given directly or by inheritance Customer Account U Subscription U Inherit = true Inherit = false In C he rit =C C C tru e A A A A A S S S S S S S S S S
  • [C] Master of your Domain
  • [C] MDM within Ciscomaster data management, sales compensation management, online customer supportDescription BenefitsReal-time conflict detection in sales compensation management. Performance : “Minutes to Milliseconds”Business-critical “P1” system. Neo4j allows Cisco to model complex Outperforms Oracle RAC, serving complex queries in real timealgorithms, which still maintaining high performance over a large Flexibilitydataset. Allows for Cisco to model interconnected data and complex queries with easeBackground RobustnessNeo4j replaces Oracle RAC, which was not performant enough for the With 9+ years of production experience, Neo4j brings a solid product.use case.Architecture3-node Enterprise cluster with mirroreddisaster recovery clusterDedicated hardware in own datacenterEmbedded in custom webappSizing35 million nodes50 million relationships600 million properties
  • 35
  • Webadmin 35
  • 36
  • Graph -[:Cypher]-> JDBC 36
  • 37
  • Today’s app: Cineasts 37
  • 38
  • The domain model 38
  • 39
  • Hands-on: Installation 39
  • Hands-on: Installation•copy zip drive contents to your box•cd cineasts•./import.sh•neo4j/bin/neo4j-shell•http://localhost:7474 39
  • Some Links๏ GraphConnect๏ Community: http://neo4j.org 40
  • 41
  • Cypher - overview๏ a pattern-matching query language๏ declarative grammar with clauses (like SQL)๏ aggregation, ordering, limits๏ create, read, update, delete 41
  • 42
  • Cypher - read clauses๏ Find a place to begin: •START <lookup>๏ Describe what to find: •MATCH <paths>๏ Filter the elements: •WHERE <filters>๏ RETURN <values> 42
  • 43
  • Cypher: START + RETURN๏ START <lookup> RETURN <expressions>๏ START binds terms using simple look-up •directly using known ids •or based on indexed Property๏ RETURN expressions specify result set 43
  • Cypher: START + RETURN๏ START <lookup> RETURN <expressions>๏ START binds terms using simple look-up •directly using known ids •or based on indexed Property๏ RETURN expressions specify result set // lookup node id 1, return that node start n=node(1) return n // lookup all nodes start n=node(*) return n 43
  • 44
  • Cypher: MATCH๏ START <lookup> MATCH <pattern> RETURN <expr>๏ MATCH describes a pattern of nodes+relationships •node terms in optional parenthesis •lines with arrows for relationships 44
  • Cypher: MATCH๏ START <lookup> MATCH <pattern> RETURN <expr>๏ MATCH describes a pattern of nodes+relationships •node terms in optional parenthesis •lines with arrows for relationships // lookup n, traverse any relationship to some m start n=node(0) match (n)--(m) return n,m; // any outgoing relationship from n to m start n=node(1) match n-->m return n,m; // only next relationships from n to m up to 3 away start n=node(1) match p=n-[:next*..3]->m return p; // from n to m and capture the relationship as r start n=node(1) match n-[r]->m return n,r,m 44
  • 45
  • Cypher: WHERE๏ START <lookup> [MATCH <pattern>] WHERE <condition> RETURN <expr>๏ WHERE filters nodes or relationships •uses expressions to constrain elements 45
  • Cypher: WHERE๏ START <lookup> [MATCH <pattern>] WHERE <condition> RETURN <expr>๏ WHERE filters nodes or relationships •uses expressions to constrain elements // lookup all nodes as n, constrained to name Andreas start n=node(*) where n.name=Andreas return n // filter nodes where age is less than 30 start n=node(*) where n.age<30 return n // filter using a regular expression start n=node(*) where n.name =~ ‘Tob.*’ return n // filter for a property exists start n=node(*) where has(n.name) return n 45
  • 46
  • Cypher: CREATE๏ CREATE <node>[,node or relationship] RETURN <expr> •create nodes with optional properties •create relationship (must have a type) 46
  • Cypher: CREATE๏ CREATE <node>[,node or relationship] RETURN <expr> •create nodes with optional properties •create relationship (must have a type) // create an anonymous node create n // create node with a property, returning it create n={name:Andreas} return n // lookup 2 nodes, then create a relationship and return it start n=node(0),m=node(1) create n-[r:KNOWS]-m return r // lookup nodes, then create a relationship with properties start n=node(1),m=node(2) create n-[r:KNOWS {since:2008}]->m 46
  • 47
  • Cypher: SET๏ SET [<node property>] [<relationship property>] •update a property on a node or relationship •must follow a START 47
  • Cypher: SET๏ SET [<node property>] [<relationship property>] •update a property on a node or relationship •must follow a START // update the name property start n=node(0) set n.name=Peter // update many nodes, using a calculation start n=node(*) set n.size=n.size+1 // match & capture a relationship, update a property start n=node(1) match n-[r]-m set r.times=10 47
  • 48
  • Cypher: DELETE๏ DELETE [<node>|<relationship>|<property>] •delete a node, relationship or property •must follow a START •tofirst a node, all relationships must be deleted delete 48
  • Cypher: DELETE๏ DELETE [<node>|<relationship>|<property>] •delete a node, relationship or property •must follow a START •tofirst a node, all relationships must be deleted delete // delete a node start n=node(5) delete n // remove a node and all relationships start n=node(3) match n-[r]-() delete n, r // remove a property start n=node(3) delete n.age 48