Intro to Graph Databases and Neo4j (SV JUG Apr, 2014)

  • 1,619 views
Uploaded on

An introduction to Graph Databases and Neo4j, given at the Silicon Valley Java User Group in April 2014. Abstract follows: …

An introduction to Graph Databases and Neo4j, given at the Silicon Valley Java User Group in April 2014. Abstract follows:

Hey, so did you know that graph databases are the fastest growing category in databases today? No? Don’t worry, most don’t. But today we will have as our guest the founder of the popular open source graph database Neo4j, to tell you the back-story.

Graph databases have been popularized by leading social web properties like Facebook and LinkedIn. So it’s only natural that most geeks out there equate graphs with social, but that’s only part of the story. Thousands of companies in a wide range of industries have quietly adopted Neo4j across a broad range of business critical uses. Forrester Research now predicts that by 2017, over 25% of enterprises will be using a graph database. So if you haven’t already encountered one, chances are you might soon!

Neo4j is mostly written in Java (some Scala!), but usable across all of the popular languages. This session will introduce fundamental graph database concepts and then dive into a hands-on code-centric introduction to Neo4j. We will cover the declarative query language Cypher, introduce operational concerns such as clustering and horizontal scalability, and describe popular graph database use cases.

You will leave with a solid understanding of graph database fundamentals and an appreciation for when graph databases are a good fit in the real world.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Emil, impressed with work - this deck can not be downloaded. Can you send me your most recent PowerPoint on the topic
    Are you sure you want to
    Your message goes here
  • looking forward to play with it ..
    Are you sure you want to
    Your message goes here
  • Very nice Emil - I hope we can connect in real soon again!
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
1,619
On Slideshare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
0
Comments
3
Likes
10

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Neo Technology, Inc Confidential Emil Eifrem emil@neotechnology.com @emileifrem #neo4j Graph All Teh Things!!!11 An Introduction to Graph Databases and Neo4j SiliconValley JUG, 2014 Thursday, April 17, 14
  • 2. Neo Technology, Inc Confidential Agenda • Graphs Are Eating The World • Wait! What Is A Graph Anyway? • Use Cases Thursday, April 17, 14
  • 3. Neo Technology, Inc Confidential WARNING! ALL I’M OFFERING IS THE TRUTH Thursday, April 17, 14
  • 4. Neo Technology, Inc Confidential Victims Thursday, April 17, 14
  • 5. Neo Technology, Inc Confidential Victims Thursday, April 17, 14
  • 6. Neo Technology, Inc Confidential Graphs Are Eating The World Thursday, April 17, 14
  • 7. Neo Technology, Inc Confidential Graphs Are Eating The World Thursday, April 17, 14
  • 8. Neo Technology, Inc Confidential “Graph analysis is the true killer app for Big Data.” Graphs Are Eating The World http://blogs.forrester.com/james_kobielus/11-12-19-the_year_ahead_in_big_data_big_cool_new_stuff_looms_large Thursday, April 17, 14
  • 9. Neo Technology, Inc Confidential “[I]t is arguable that graph databases will have a bigger impact on the database landscape than Hadoop or its competitors.” http://www.bloorresearch.com/blog/IM-Blog/2012/5/graph-databases-nosql.html Graphs Are Eating The World Thursday, April 17, 14
  • 10. Neo Technology, Inc Confidential Graphs Are Eating The World Thursday, April 17, 14
  • 11. Neo Technology, Inc Confidential Graphs Are Eating The World http://www.asterdata.com/resources/video-discovery-platform.phpttp://www.zdnet.com/teradata-aster-gets-graph-database-hdfs-compatible-file-store-7000021667/ Thursday, April 17, 14
  • 12. Neo Technology, Inc Confidential http://www.forrester.com/TechRadar+Enterprise+DBMS+Q1+2014/fulltext/-/E-RES106801 Graphs Are Eating The World “Forrester estimates that over 25% of enterprises will be using graph databases by 2017” Thursday, April 17, 14
  • 13. Neo Technology, Inc Confidential Graphs Are Eating The World *Source: http://db-engines.com/en/ranking Thursday, April 17, 14
  • 14. Neo Technology, Inc Confidential Graphs Are Eating The World Finance Accenture Thursday, April 17, 14
  • 15. Neo Technology, Inc Confidential What Is A Graph, Anyway? Thursday, April 17, 14
  • 16. Neo Technology, Inc Confidential Graphs! Dancing With Michael Jackson Eating Brains Thursday, April 17, 14
  • 17. Neo Technology, Inc Confidential Dancing With Michael Jackson Eating Brains Not really. These are Charts! Thursday, April 17, 14
  • 18. Neo Technology, Inc Confidential Thursday, April 17, 14
  • 19. Neo Technology, Inc Confidential What Is “Graph Search,” Anyway? Thursday, April 17, 14
  • 20. Neo Technology, Inc Confidential Cypher LOVES A B Graph PatternsASCII art MATCH (A) -[:LOVES]-> (B) WHERE A.name = "A" RETURN B Thursday, April 17, 14
  • 21. Neo Technology, Inc Confidential MATCH (me:Person)-[:IS_FRIEND_OF]->(friend:Person), (friend)-[:LIKES]->(restaurant), (restaurant)-[:LOCATED_IN]->(newyork:City), (restaurant)-[:SERVES]->(sushi:Cuisine) WHERE me.name = 'Emil' AND newyork.location='New York' AND sushi.cuisine='Sushi' RETURN restaurant.name http://maxdemarzi.com/?s=facebook Thursday, April 17, 14
  • 22. Neo Technology, Inc Confidential Thursday, April 17, 14
  • 23. “Find all direct reports and how many they manage, up to 3 levels down” Example HR Query (using SQL) Thursday, April 17, 14
  • 24. *“Find all direct reports and how many they manage, up to 3 levels down” (SELECT T.directReportees AS directReportees, sum(T.count) AS count FROM ( SELECT manager.pid AS directReportees, 0 AS count FROM person_reportee manager WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") UNION SELECT manager.pid AS directReportees, count(manager.directly_manages) AS count FROM person_reportee manager WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION SELECT manager.pid AS directReportees, count(reportee.directly_manages) AS count FROM person_reportee manager JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION SELECT manager.pid AS directReportees, count(L2Reportees.directly_manages) AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees ) AS T GROUP BY directReportees) UNION (SELECT T.directReportees AS directReportees, sum(T.count) AS count FROM ( SELECT manager.directly_manages AS directReportees, 0 AS count FROM person_reportee manager WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") UNION SELECT reportee.pid AS directReportees, count(reportee.directly_manages) AS count FROM person_reportee manager JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION (continued from previous page...) SELECT depth1Reportees.pid AS directReportees, count(depth2Reportees.directly_manages) AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees ) AS T GROUP BY directReportees) UNION (SELECT T.directReportees AS directReportees, sum(T.count) AS count FROM( SELECT reportee.directly_manages AS directReportees, 0 AS count FROM person_reportee manager JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION SELECT L2Reportees.pid AS directReportees, count(L2Reportees.directly_manages) AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees ) AS T GROUP BY directReportees) UNION (SELECT L2Reportees.directly_manages AS directReportees, 0 AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") ) Example HR Query (using SQL) Thursday, April 17, 14
  • 25. MATCH  (boss)-­‐[:MANAGES*0..3]-­‐>(sub),            (sub)-­‐[:MANAGES*1..3]-­‐>(report) WHERE  boss.name  =  “John  Doe” RETURN  sub.name  AS  Subordinate,  count(report)  AS  Total Same Query in Cypher *“Find all direct reports and how many they manage, up to 3 levels down” Thursday, April 17, 14
  • 26. Neo Technology, Inc Confidential What About Performance? Database # persons query time MySQL Neo4j Neo4j 1,000 ๏a sample social graph •with ~1,000 persons ๏average 50 friends per person ๏pathExists(a,b) limited to depth 4 ๏caches warmed up to eliminate disk I/O Thursday, April 17, 14
  • 27. Neo Technology, Inc Confidential What About Performance? Database # persons query time MySQL Neo4j Neo4j 1,000 2,000 ms ๏a sample social graph •with ~1,000 persons ๏average 50 friends per person ๏pathExists(a,b) limited to depth 4 ๏caches warmed up to eliminate disk I/O Thursday, April 17, 14
  • 28. Neo Technology, Inc Confidential Database # persons query time MySQL Neo4j Neo4j 1,000 2,000 ms 1,000 2 ms ๏a sample social graph •with ~1,000 persons ๏average 50 friends per person ๏pathExists(a,b) limited to depth 4 ๏caches warmed up to eliminate disk I/O What About Performance? Thursday, April 17, 14
  • 29. Neo Technology, Inc Confidential Database # persons query time MySQL Neo4j Neo4j 1,000 2,000 ms 1,000 2 ms 1,000,000 ๏a sample social graph •with ~1,000 persons ๏average 50 friends per person ๏pathExists(a,b) limited to depth 4 ๏caches warmed up to eliminate disk I/O What About Performance? Thursday, April 17, 14
  • 30. Neo Technology, Inc Confidential Database # persons query time MySQL Neo4j Neo4j 1,000 2,000 ms 1,000 2 ms 1,000,000 2 ms ๏a sample social graph •with ~1,000 persons ๏average 50 friends per person ๏pathExists(a,b) limited to depth 4 ๏caches warmed up to eliminate disk I/O What About Performance? Thursday, April 17, 14
  • 31. Neo Technology, Inc Confidential “Our  Neo4j  solution  is  literally  thousands  of  times   faster  than  the  prior  MySQL  solution, with  queries  that  require  10-­‐100  times  less  code.” -­‐  Volker  Pacher,  Senior  Developer  eBay What About Performance? Thursday, April 17, 14
  • 32. Real-Time/ OLTP Offline/ Batch Connected Data Thursday, April 17, 14
  • 33. Neo Technology, Inc Confidential Use Cases Thursday, April 17, 14
  • 34. Neo Technology, Inc Confidential Network Impact Analysis Thursday, April 17, 14
  • 35. Neo Technology, Inc Confidential Network Management Example (Network Graph) Thursday, April 17, 14
  • 36. Neo Technology, Inc Confidential Network Management - Create CREATE ! (crm {name:"CRM"}), ! (dbvm {name:"Database VM"}), ! (www {name:"Public Website"}), ! (wwwvm {name:"Webserver VM"}), ! (srv1 {name:"Server 1"}), ! (san {name:"SAN"}), ! (srv2 {name:"Server 2"}), ! (crm)-[:DEPENDS_ON]->(dbvm), ! (dbvm)-[:DEPENDS_ON]->(srv2), ! (srv2)-[:DEPENDS_ON]->(san), ! (www)-[:DEPENDS_ON]->(dbvm), ! (www)-[:DEPENDS_ON]->(wwwvm), ! (wwwvm)-[:DEPENDS_ON]->(srv1), ! (srv1)-[:DEPENDS_ON]->(san) Practical Cypher Thursday, April 17, 14
  • 37. Neo Technology, Inc Confidential Network Management - Impact Analysis // Server 1 Outage MATCH (n)<-[:DEPENDS_ON*]-(upstream) WHERE n.name = "Server 1" RETURN upstream Practical Cypher upstream {name:"Webserver VM"} {name:"Public Website"} Thursday, April 17, 14
  • 38. Neo Technology, Inc Confidential Network Management - Dependency Analysis // Public website dependencies MATCH (n)-[:DEPENDS_ON*]->(downstream) WHERE n.name = "Public Website" RETURN downstream Practical Cypher downstream {name:"Database VM"} {name:"Server 2"} {name:"SAN"} {name:"Webserver VM"} {name:"Server 1"} Thursday, April 17, 14
  • 39. Neo Technology, Inc Confidential Network Management - Statistics // Most depended on component MATCH (n)<-[:DEPENDS_ON*]-(dependent) RETURN n, count(DISTINCT dependent) AS dependents ORDER BY dependents DESC LIMIT 1 Practical Cypher n dependents {name:"SAN"} 6 Thursday, April 17, 14
  • 40. Neo Technology, Inc Confidential Route Finding Thursday, April 17, 14
  • 41. Neo Technology, Inc Confidential Recommendations Thursday, April 17, 14
  • 42. Neo Technology, Inc Confidential Logistics Thursday, April 17, 14
  • 43. Neo Technology, Inc Confidential Access Control Thursday, April 17, 14
  • 44. Neo Technology, Inc Confidential Fraud Detection Thursday, April 17, 14
  • 45. Neo Technology, Inc Confidential Securities and Debt Thursday, April 17, 14
  • 46. Neo Technology, Inc Confidential Basically: Graph All The Things!!!1 Thursday, April 17, 14
  • 47. Gartner’s “5 Graphs” Social Graph Ref: http://www.gartner.com/id=2081316 Interest Graph Payment Graph Intent Graph Mobile Graph Thursday, April 17, 14
  • 48. Neo Technology, Inc Confidential • Network Graph (e.g. Network Dependency Analysis, Network Inventory, etc.) • Social Graph (mobile apps, social recommendations, collaboration) • Call Graph (creating inferred social graph, churn reduction, etc.) • Master Data Graph (org & product hierarchy, data governance, IAM) • Help Desk Graph (enterprise collaboration) 5 Graphs of Telco Thursday, April 17, 14
  • 49. Neo Technology, Inc Confidential • Payment Graph (e.g. Fraud Detection, Credit Risk Analysis, Chargebacks...) • Customer Graph (org drillthru, product recommendations, mobile payments, etc.) • Entitlement Graph (identity & access management, authorization) • Asset Graph (portfolio analytics, risk management, market & sentiment analysis, compliance) • Master Data Graph (enterprise collaboration, corporate hierarchy, data governance) 5 Graphs of Finance Finance Finance Thursday, April 17, 14
  • 50. Neo Technology, Inc Confidential • Provider Graph (e.g. referrals, patient management, research) • Patient Graph (support communities, doctor recommendations, clinical trials) • Bioinformatic Graph (drug research, genetic screening, bioengineering, etc.) • Master Data Graph (biological master data, evolutionary taxonomy, the access control graph, etc.) • Treatment Graph (collaborative medicine, clinical trials, etc.) 5 Graphs of Health Care Thursday, April 17, 14
  • 51. Neo Technology, Inc Confidential Graph Databases The Definitive Book on Graph Databases Available as a free PDF download at graphdatabases.com! Thursday, April 17, 14
  • 52. Neo Technology, Inc Confidential Brown Bag Lunch By request only! • you bring 10+ colleagues • you provide a room with a projector + screen • we bring a bag lunch • we introduce Neo4j to your team in 45 min + 15 min for Q&A Schedule your Neo4j Intro now! Thursday, April 17, 14
  • 53. • The premier source for training on Graph Databases and Neo4j • Join us for the ‘All You Can Graph Day’ on May 9 in Palo Alto! • Sign up at graphacademy.com Thursday, April 17, 14
  • 54. • Wednesday, Oct. 22 - graphconnect.com • Only conference focused on graphDBs and applications powered by graphs • Register now for $99 Alpha Geek Passes GRAPHCONNECT SF 2014 Thursday, April 17, 14
  • 55. • 4/24 - GraphPUB Silicon Valley at Tied House • 5/1 - Uncovering Invisible Relationships with a GraphDB at Medium • 5/6 - GraphPANEL Silicon Valley at AOL • And more at: meetup.com/graphdb-sf UPCOMING EVENTS GRAPHPUB NEO4J 20 14 Thursday, April 17, 14
  • 56. Thursday, April 17, 14
  • 57. Neo Technology, Inc Confidential teh end (sic) stay connected Thursday, April 17, 14