2012 09 SF Data Mining zero to hero
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

2012 09 SF Data Mining zero to hero

  • 1,288 views
Uploaded on

This is the presentation given by Michael Hunger and Peter Neubauer at the SF Data Mining group, see http://www.meetup.com/Data-Mining/events/80275492/

This is the presentation given by Michael Hunger and Peter Neubauer at the SF Data Mining group, see http://www.meetup.com/Data-Mining/events/80275492/

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,288
On Slideshare
1,283
From Embeds
5
Number of Embeds
3

Actions

Shares
Downloads
62
Comments
0
Likes
6

Embeds 5

http://www.linkedin.com 3
https://twitter.com 1
https://www.linkedin.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • There existed a number of different ways to query a graph database. This one aims to make querying easy, and to produce queries that are readable.\n\nWe looked at alternatives - SPARQL, SQL, Gremlin and other...\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Visualization done with GraphViz\nA user will have many such “stacks”\n
  • Search ranking weighs inbound and outbound node connections as part of search score calculation\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Transcript

  • 1. Neo4j - from 0 to Her0 Thank you Trulia!Peter Neubauer Andreas Kollegger Michael Hunger@peterneubauer @akollegger @mesirii#neo4j 1
  • 2. Neo4j - from 0 to Her0 Thank you Trulia!Peter Neubauer Andreas Kollegger Michael Hunger@peterneubauer @akollegger @mesirii#neo4j 1
  • 3. Neo4j Neo4j - from 0 to Her0 Thank you Trulia!Peter Neubauer Andreas Kollegger Michael Hunger@peterneubauer @akollegger @mesirii#neo4j 1
  • 4. 1
  • 5. 2
  • 6. Um, Neo for what? 2
  • 7. Um, Neo for what?•Neo4j - the graph database•Graph•no: not for charts & diagrams, or vector artwork•yes: for storing data that is connected•remember linker lists, trees?•graphs are the general-purpose data structure 2
  • 8. Neo4j is a Graph Database 3
  • 9. Neo4j is a Graph Database๏ A Graph Database: 3
  • 10. Neo4j is a Graph Database๏ A Graph Database: • a Property Graph with Nodes, Relationships and Properties on both 3
  • 11. Neo4j is a Graph Database๏ A Graph Database: • a Property Graph with Nodes, Relationships and Properties on both • perfect for complex, highly connected data 3
  • 12. Neo4j is a Graph Database๏ A Graph Database: • a Property Graph with Nodes, Relationships and Properties on both • perfect for complex, highly connected data๏ A Graph Database: 3
  • 13. Neo4j is a Graph Database๏ A Graph Database: • a Property Graph with Nodes, Relationships and Properties on both • perfect for complex, highly connected data๏ A Graph Database: • reliable with real ACID Transactions 3
  • 14. Neo4j is a Graph Database๏ A Graph Database: • a Property Graph with Nodes, Relationships and Properties on both • perfect for complex, highly connected data๏ A Graph Database: • reliable with real ACID Transactions • scalable: 32 Billion Nodes, 32 Billion Relationships, 64 Billion Properties 3
  • 15. Neo4j is a Graph Database๏ A Graph Database: • a Property Graph with Nodes, Relationships and Properties on both • perfect for complex, highly connected data๏ A Graph Database: • reliable with real ACID Transactions • scalable: 32 Billion Nodes, 32 Billion Relationships, 64 Billion Properties • Server with REST API, or Embeddable on the JVM 3
  • 16. Neo4j is a Graph Database๏ A Graph Database: • a Property Graph with Nodes, Relationships and Properties on both • perfect for complex, highly connected data๏ A Graph Database: • reliable with real ACID Transactions • scalable: 32 Billion Nodes, 32 Billion Relationships, 64 Billion Properties • Server with REST API, or Embeddable on the JVM • high-performance with High-Availability (read scaling) 3
  • 17. Graph DB 101 4
  • 18. 5
  • 19. You know relational 5
  • 20. You know relational 5
  • 21. You know relational foo 5
  • 22. You know relational foo bar 5
  • 23. You know relational foo foo_bar bar 5
  • 24. You know relational foo foo_bar bar 5
  • 25. You know relational foo foo_bar bar 5
  • 26. You know relational foo foo_bar bar 5
  • 27. You know relationalnow consider relationships... 5
  • 28. You know relationalnow consider relationships... 5
  • 29. You know relationalnow consider relationships... 5
  • 30. You know relationalnow consider relationships... 5
  • 31. You know relationalnow consider relationships... 5
  • 32. You know relationalnow consider relationships... 5
  • 33. 5
  • 34. 6
  • 35. Were talking about aProperty Graph 6
  • 36. Were talking about aProperty Graph Nodes 6
  • 37. Were talking about aProperty Graph Nodes Relationships 6
  • 38. Were talking about aProperty Graph Em Joh il a n knows knows Alli Tob Lar Nodes son ias knows s knows And And knows knows rea rés s knows knows knows Pet Miic Mc knows Ian er knows a a knows knows De Mic lia h ael Relationships Properties (each a key+value) + Indexes (for easy look-ups) 6
  • 39. 6
  • 40. 7
  • 41. And, but, so how do youquery this "graph" database? 7
  • 42. Cypher Query Language
  • 43. What is Cypher?• Graph Query Language for Neo4j• Aims to make querying simple
  • 44. Design Decisions Pattern matching
  • 45. Design Decisions Pattern matching A B C
  • 46. Design Decisions Pattern matching
  • 47. Design Decisions Pattern matching
  • 48. Design Decisions Pattern matching
  • 49. Design Decisions Pattern matching
  • 50. Design Decisions ASCII-art patterns
  • 51. Design Decisions ASCII-art patterns () --> ()
  • 52. Design Decisions Directed relationship A B
  • 53. Design Decisions Directed relationship A B (A) --> (B)
  • 54. Design Decisions Undirected relationship A B
  • 55. Design Decisions Undirected relationship A B (A) -- (B)
  • 56. Design Decisions specific relationships LOVES A B
  • 57. Design Decisions specific relationships LOVES A B A -[:LOVES]-> B
  • 58. Design Decisions Joined paths A B C
  • 59. Design Decisions Joined paths A B C A --> B --> C
  • 60. Design Decisions multiple paths A B C
  • 61. Design Decisions multiple paths A B C A --> B --> C, A --> C
  • 62. Design Decisions multiple paths A B C A --> B --> C, A --> C A --> B --> C <-- A
  • 63. Design Decisions Optional relationships A B
  • 64. Design Decisions Optional relationships A B A -[?]-> B
  • 65. SELECT skills.*, user_skill.*FROM usersJOIN user_skill ON users.id = user_skill.user_idJOIN skills ON user_skill.skill_id = skill.id WHEREusers.id = 1
  • 66. START user = node(1)MATCH user -[user_skill]-> skillRETURN skill, user_skill
  • 67. IndexesUsed as multiple starting points, not to speedup any traversalsSTART a = node:nodes_index(type=User) MATCHa-[r:knows]-bRETURN ID(a), ID(b), r.weight
  • 68. Variable length Path MatchSome UGLY recursive self join on the groupstableSTART max=node:person(name=“Max")MATCH group <-[:BELONGS_TO*]- maxRETURN group
  • 69. 27
  • 70. Who uses this stuff? 27
  • 71. [A] Mmm Pancakes
  • 72. [A] Mmm Pancakes
  • 73. Cute meta + dataThis Material is subject to the terms of the Mozilla Public # License, v. 2.0. If a copy of theMPL was not distributed with this # file, You can obtain one at http://mozilla.org/MPL/2.0/
  • 74. Cute meta + dataThis Material is subject to the terms of the Mozilla Public # License, v. 2.0. If a copy of theMPL was not distributed with this # file, You can obtain one at http://mozilla.org/MPL/2.0/
  • 75. Neo4J Co-Existence • Node uuids as refs in external ElasticSearch also in internal Lucene • Custom search ranking for user history based on node relationship data • MySQL for user data, Redis for metricsThis Material is subject to the terms of the Mozilla Public # License, v. 2.0. If a copy of theMPL was not distributed with this # file, You can obtain one at http://mozilla.org/MPL/2.0/
  • 76. [B] ACL from HellOne of the top 10 telcos worldwide
  • 77. Example Access Authorization UserAccess may be given directly or by inheritance Customer Account U Subscription U Inherit = true Inherit = false In C he rit =C C C tru e A A A A A S S S S S S S S S S
  • 78. [C] Master of your Domain
  • 79. [C] MDM within Ciscomaster data management, sales compensation management, online customer supportDescription BenefitsReal-time conflict detection in sales compensation management. Performance : “Minutes to Milliseconds”Business-critical “P1” system. Neo4j allows Cisco to model complex Outperforms Oracle RAC, serving complex queries in real timealgorithms, which still maintaining high performance over a large Flexibilitydataset. Allows for Cisco to model interconnected data and complex queries with easeBackground RobustnessNeo4j replaces Oracle RAC, which was not performant enough for the With 9+ years of production experience, Neo4j brings a solid product.use case.Architecture3-node Enterprise cluster with mirroreddisaster recovery clusterDedicated hardware in own datacenterEmbedded in custom webappSizing35 million nodes50 million relationships600 million properties
  • 80. 35
  • 81. Webadmin 35
  • 82. 36
  • 83. Graph -[:Cypher]-> JDBC 36
  • 84. 37
  • 85. Today’s app: Cineasts 37
  • 86. 38
  • 87. The domain model 38
  • 88. 39
  • 89. Hands-on: Installation 39
  • 90. Hands-on: Installation•copy zip drive contents to your box•cd cineasts•./import.sh•neo4j/bin/neo4j-shell•http://localhost:7474 39
  • 91. Some Links๏ GraphConnect๏ Community: http://neo4j.org 40
  • 92. 41
  • 93. Cypher - overview๏ a pattern-matching query language๏ declarative grammar with clauses (like SQL)๏ aggregation, ordering, limits๏ create, read, update, delete 41
  • 94. 42
  • 95. Cypher - read clauses๏ Find a place to begin: •START <lookup>๏ Describe what to find: •MATCH <paths>๏ Filter the elements: •WHERE <filters>๏ RETURN <values> 42
  • 96. 43
  • 97. Cypher: START + RETURN๏ START <lookup> RETURN <expressions>๏ START binds terms using simple look-up •directly using known ids •or based on indexed Property๏ RETURN expressions specify result set 43
  • 98. Cypher: START + RETURN๏ START <lookup> RETURN <expressions>๏ START binds terms using simple look-up •directly using known ids •or based on indexed Property๏ RETURN expressions specify result set // lookup node id 1, return that node start n=node(1) return n // lookup all nodes start n=node(*) return n 43
  • 99. 44
  • 100. Cypher: MATCH๏ START <lookup> MATCH <pattern> RETURN <expr>๏ MATCH describes a pattern of nodes+relationships •node terms in optional parenthesis •lines with arrows for relationships 44
  • 101. Cypher: MATCH๏ START <lookup> MATCH <pattern> RETURN <expr>๏ MATCH describes a pattern of nodes+relationships •node terms in optional parenthesis •lines with arrows for relationships // lookup n, traverse any relationship to some m start n=node(0) match (n)--(m) return n,m; // any outgoing relationship from n to m start n=node(1) match n-->m return n,m; // only next relationships from n to m up to 3 away start n=node(1) match p=n-[:next*..3]->m return p; // from n to m and capture the relationship as r start n=node(1) match n-[r]->m return n,r,m 44
  • 102. 45
  • 103. Cypher: WHERE๏ START <lookup> [MATCH <pattern>] WHERE <condition> RETURN <expr>๏ WHERE filters nodes or relationships •uses expressions to constrain elements 45
  • 104. Cypher: WHERE๏ START <lookup> [MATCH <pattern>] WHERE <condition> RETURN <expr>๏ WHERE filters nodes or relationships •uses expressions to constrain elements // lookup all nodes as n, constrained to name Andreas start n=node(*) where n.name=Andreas return n // filter nodes where age is less than 30 start n=node(*) where n.age<30 return n // filter using a regular expression start n=node(*) where n.name =~ ‘Tob.*’ return n // filter for a property exists start n=node(*) where has(n.name) return n 45
  • 105. 46
  • 106. Cypher: CREATE๏ CREATE <node>[,node or relationship] RETURN <expr> •create nodes with optional properties •create relationship (must have a type) 46
  • 107. Cypher: CREATE๏ CREATE <node>[,node or relationship] RETURN <expr> •create nodes with optional properties •create relationship (must have a type) // create an anonymous node create n // create node with a property, returning it create n={name:Andreas} return n // lookup 2 nodes, then create a relationship and return it start n=node(0),m=node(1) create n-[r:KNOWS]-m return r // lookup nodes, then create a relationship with properties start n=node(1),m=node(2) create n-[r:KNOWS {since:2008}]->m 46
  • 108. 47
  • 109. Cypher: SET๏ SET [<node property>] [<relationship property>] •update a property on a node or relationship •must follow a START 47
  • 110. Cypher: SET๏ SET [<node property>] [<relationship property>] •update a property on a node or relationship •must follow a START // update the name property start n=node(0) set n.name=Peter // update many nodes, using a calculation start n=node(*) set n.size=n.size+1 // match & capture a relationship, update a property start n=node(1) match n-[r]-m set r.times=10 47
  • 111. 48
  • 112. Cypher: DELETE๏ DELETE [<node>|<relationship>|<property>] •delete a node, relationship or property •must follow a START •tofirst a node, all relationships must be deleted delete 48
  • 113. Cypher: DELETE๏ DELETE [<node>|<relationship>|<property>] •delete a node, relationship or property •must follow a START •tofirst a node, all relationships must be deleted delete // delete a node start n=node(5) delete n // remove a node and all relationships start n=node(3) match n-[r]-() delete n, r // remove a property start n=node(3) delete n.age 48