Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Neo4j GraphTalk Oslo - Introduction to Graphs


Published on

Neo4j GraphTalk Oslo 2019
Rik Van Bruggen, Neo4j

Published in: Software
  • Be the first to comment

  • Be the first to like this

Neo4j GraphTalk Oslo - Introduction to Graphs

  1. 1. Welcome to the Oslo 1 Tuesday, May 28th 2019
  2. 2. Introduction 2
  3. 3. • 9:00 - 9:30 - Breakfast Networking - Welcome • 9:30 - 12:30 - Presentations • Introduction to the Neo4j Graph Platform Rik Van Bruggen, Neo4j • Exploring networks of companies to predict default probabilities Aiko Yamashita, DNB • Building Intelligent Solutions with Graphs Dinuke Abeysekera, Neo4j • 12:30 - Q&A & Networking 3 Our agenda for today
  4. 4. About Neo4j Rik Van Bruggen @rvanbruggen 4
  5. 5. Neo4j is an enterprise-grade native graph platform that enables you to: • Store, reveal and query data relationships • Traverse and analyze any levels of depth in real-time • Add context and connect new data on the fly 5 Who We Are: Leader in Graph Innovations • Performance • ACID Transactions • Schema-free Agility • Graph Algorithms Designed, built and tested natively for graphs from the start for: • Developer Productivity • Hardware Efficiency • Global Scale • Graph Adoption Graph Transactions Graph Analytics Data Integration Development & Admin Analytics Tooling Drivers & APIs Discovery & Visualization
  6. 6. 720+ 7/10 12/25 8/10 53K+ 100+ 300+ 450+ Adoption Top Retail Firms Top Financial Firms Top Software Vendors Customers Partners • Creator of the Neo4j Graph Platform • ~250 employees • HQ in Silicon Valley, other offices include London, Munich, Paris and Malmö Sweden • $80M new funding led by Morgan Stanley & One Peak. Total $160M from Fidelity, Sunstone, Conor, Creandum, and Greenbridge Capital • Over 15M+ downloads & container pulls • 325+ enterprise subscription customers with over half with >$1B in revenue Ecosystem Startups in program Enterprise customers Partners Meet up members Events per year Industry’s Largest Dedicated Investment in Graphs Neo4j - The Graph Company
  7. 7. Strictly ConfidentialStrictly Confidential 7 Neo4j Is Helping The World To Make Sense of Data ICIJ used Neo4j to uncover the world’s largest journalistic leak to date, The Panama Papers NASA uses Neo4j for a “Lessons Learned” database to improve effectiveness in search missions in space Neo4j is used to graph the human body, map correlations, identify cause & effect and search for the cure for cancer SAVING DEMOCRACY MISSION TO MARS CURING CANCER
  8. 8. The Neo4j Graph Platform Rik Van Bruggen @rvanbruggen 8
  9. 9. Networks of People Business Processes Knowledge Networks E.g., Risk management, Supply chain, Payments E.g., Employees, Customers, Suppliers, Partners, Influencers E.g., Enterprise content, Domain specific content, eCommerce content Data connections are increasing as rapidly as data volumes The Rise of Connections in Data Electronic Networks On-prem & cloud computing, Cellular, Telco & Internet, IoT, Blockchain
  10. 10. 10 Graph Databases are Designed for Connected Data TRADITIONAL DATABASES BIG DATA TECHNOLOGY Store and retrieve data Aggregate and filter data Connections in data Real time storage & retrieval Real-Time Connected Insights Long running queries aggregation & filtering “Our Neo4j solution is literally thousands of times faster than the prior MySQL solution, with queries that require 10-100 times less code” Volker Pacher, Senior Developer Up to 3 Max # of hops 1 Millions
  11. 11. CAR name: “Dan” born: May 29, 1970 twitter: “@dan” name: “Ann” born: Dec 5, 1975 since: Jan 10, 2011 brand: “Volvo” model: “V70” Latitude: 37.5629900° Longitude: -122.3255300° Nodes • Can have Labels to classify nodes • Labels have native indexes Relationships • Relate nodes by type and direction Properties • Attributes of Nodes & Relationships • Stored as Name/Value pairs • Can have indexes and composite indexes • Visibility security by user/role Neo4j Invented the Labeled Property Graph Model MARRIED TO LIVES WITH PERSON PERSON 11
  12. 12. Collections-Focused Multi-Model, Documents, Columns & Simple Tables, Joins Neo4j is designed for data relationships Different Paradigms NoSQL Relational DBMS Neo4j Graph Platform Connections-Focused Focused on Data Relationships Development Benefits Easy model maintenance Easy query Deployment Benefits Ultra high performance Minimal resource usage
  13. 13. How Neo4j Fits — Common Architecture Patterns From Disparate Silos To Cross-Silo Connections From Tabular Data To Connected Data From Data Lake Analytics to Real-Time Operations
  14. 14. Cypher: Powerful & Expressive Query Language MATCH (:Person { name:“Dan”} ) -[:MARRIED_TO]-> (spouse) MARRIED_TO Dan Ann NODE RELATIONSHIP TYPE LABEL PROPERTY VARIABLE
  15. 15. Graph Transactions Graph Analytics Data Integration Development & Admin Analytics Tooling Drivers & APIs Discovery & Visualization Developers Admins Applications Business Users Data Analysts Data Scientists Enterprise Data Hub Native Graph Platform: Tools for Many Users
  16. 16. 16 Strongly Differentiated Commercial Offering Enterprise Edition is Highly Differentiated from CE Full Text Indexing & Search2 ✔ ✔ Date/Time data type1 ✔ ✔ 3D Geospatial data types1 ✔ ✔ Native Indexes – up to 5x faster writes1 ✔ ✔ 100B+ Bulk Importer1 ✔ Resumable1 Enterprise Cypher Runtime up to 70% faster – ✔ Hot Backups – 2x Faster1 ACID Transactions ✔ ✔ High-performance native API ✔ ✔ High-performance caching ✔ ✔ Cost-based query optimizer ✔ ✔ Index-backed ORDER BY sorting in Cypher2 ✔ ✔ Graph algorithms library to support AI initiatives ✔ ✔ Massively parallel graph algorithms – ✔ Query monitoring with enriched metrics – ✔ User and role-based security – ✔ Brute force login prevention settings2 – ✔ LDAP and Active Directory Integration – ✔ Kerberos security option – ✔ Multi-Clustering (partition of clusters)1 – ✔ Automatic Cache Warming1 – ✔ Rolling Upgrades1 – ✔ Resumable Copy/ Restore Cluster Member – ✔ New diagnostic metrics and support tools1 – ✔ Property Blacklisting – ✔ Language drivers for Java, Python, Go2, C# & JavaScript ✔ ✔ Bolt Binary Protocol & Seabolt, C-based driver framework2 ✔ ✔ RPM, Debian, Docker, Azure & AWS Cloud Delivery ✔ ✔ Intra-cluster encryption secures traffic & server comms2 across data centers & cloud zones – ✔ IPv6 support in clustered deployments – Available High throughput, least-connected load balancing built into Bolt drivers – ✔ Causal Clustering, core and read-replica design at global scale for applications, analytics workflows, HA and DR – ✔ Enterprise Lock Manager accesses all cores on server – ✔ Labeled property graph model ✔ ✔ Native graph processing & storage ✔ ✔ Cypher graph query language ✔ ✔ Neo4j Browser with syntax highlighting ✔ ✔ Fast writes via native indexes2 ✔ ✔ Composite Indexes ✔ ✔ Cypher for Apache Spark (CAPS) for big data analytics ✔ ✔ Graph size limitations 34B nodes None Auto reuse of deleted space – ✔ Property existence constraints – ✔ Cypher query tracing, monitoring and metrics – ✔ Node Key schema constraints – ✔ Neo4j Desktop: Free developer-friendly package with full database and tools – ✔ CommunityDatabase Features Architectural Features Graph Platform Features 1New in Neo4j 3.4 & 23.5 Enterprise Community Enterprise Community Enterprise
  17. 17. Neo4j Bloom 17 • High fidelity • Scene navigation • Property views • Search suggestions • Saved phrase history • Property editor • Schema perspectives • Bloom chart type • Visualize • Communicate • Discover • Navigate • Isolate • Edit • Share
  18. 18. Graph & ML Algorithms in Neo4j+35 graph-algorithms- book/ Pathfinding & Search Centrality / Importance Community Detection Link Prediction Finds optimal paths or evaluates route availability and quality Determines the importance of distinct nodes in the network Detects group clustering or partition options Evaluates how alike nodes are Estimates the likelihood of nodes forming a future relationship Similarity
  19. 19. Graph and ML Algorithms in Neo4j • Parallel Breadth First Search & DFS • Shortest Path • Single-Source Shortest Path • All Pairs Shortest Path • Minimum Spanning Tree • A* Shortest Path • Yen’s K Shortest Path • K-Spanning Tree (MST) • Random Walk • Degree Centrality • Closeness Centrality • CC Variations: Harmonic, Dangalchev, Wasserman & Faust • Betweenness Centrality • Approximate Betweenness Centrality • PageRank • Personalized PageRank • ArticleRank • Eigenvector Centrality • Triangle Count • Clustering Coefficients • Connected Components (Union Find) • Strongly Connected Components • Label Propagation • Louvain Modularity – 1 Step & Multi- Step • Balanced Triad (identification) • Euclidean Distance • Cosine Similarity • Jaccard Similarity • Overlap Similarity • Pearson Similarity Pathfinding & Search Centrality / Importance Community Detection Similarity graph-algorithms/current/ Updated April 2019 Link Prediction • Adamic Adar • Common Neighbors • Preferential Attachment • Resource Allocations • Same Community • Total Neighbors
  20. 20. 1. Knowledge Graphs Context for Decisions 2. Connected Feature Extraction Context for Credibility 4. AI Explainability3. Graph- Accelerated AI Context for Efficiency Context for Accuracy Four Pillars of Graph-Enhanced AI
  21. 21. Graph Database Surging in Popularity 21
  22. 22. 20M+ Downloads 8M+ from Neo4j Distribution 12M+ from Docker Events 400+ Approximate Number of Neo4j Events per Year 50k+ Meetups Number of Meetup Members Globally 50k+ Trained/certified Neo4j professionals 1k Certified Trained Developers Largest Pool of Graph Technologists
  23. 23. "Neo4j continues to dominate the graph database market.” “69% of enterprises have, or are planning to implement graphs over next 12 months” October, 2017 “The most widely stated reason in the survey for selecting Neo4j was to drive innovation” February, 2018 Critical Capabilities for DBMA “In fact, the rapid rise of Neo4j and other graph technologies may signal that data connectedness is indeed a separate paradigm from the model consolidation happening across the rest of the NoSQL landscape.” March, 2018 Analysts See Unique Benefits of Graphs "Neo4j is the clear market leader in the graph space. It has the most users, it uses a widely adopted language that is much easier than Gremlin and in many respects, it has consistently been a lot more innovative than its competitors.” “It is the Oracle or SQL Server of the graph database world.” March, 2019 "Our research suggests that graph databases have the best chance to survive and thrive as a distinct category (versus the other NoSQL models) because connected data applications present serious performance problems that only a specialized graph DB can solve.” March, 2019
  24. 24. Graph Use Cases 24
  25. 25. Recommendations Dynamic Pricing IoT-applicationsFraud Detection Real-Time Transaction Applications Generate and Protect Revenue Customer Engagement Metadata and Advanced Analytics Data Lake Integration Knowledge Graphs for AI Risk Mitigation Generate Actionable Insights Network Management Supply Chain Efficiency Identity and Access Management Internal Business Processes Improve Efficiency and Cut Costs Graph Use Cases by Value Proposition
  26. 26. 26 Real-Time Recommendations Fraud Detection Network & IT Operations Master Data Management Knowledge Graph Identity & Access Management Common Graph Technology Use Cases AirBnb
  27. 27. Software Financial Services Telecom Retail & Consumer Goods Media & Entertainment Other Industries Airbus 27 Copyright © 2017 Neo4j, Inc. Company Confidential
  28. 28. Graphs Drive Innovation 28 Context Paths Auto-Graphs Graph Layers 1st Graph Cross-Connect Cross-tech applications Internet of Things operations Transparent Neural Networks Blockchain-managed systems Adjacent graph layers inspire new innovations Metadata / Risk Management Knowledge Graphs AI- Powered Customer Experiences Connect unlike objects such as people to products, locations Mobile app explosion Recommendation engines Fraud detectors Desire for more context to follow connections Connects like objects People, computer networks, telco, etc
  29. 29. Business Problem • Find relationships between people, accounts, shell companies and offshore accounts • Journalists are non-technical • Biggest “Snowden-Style” document leak ever; 11.5 million documents, 2.6TB of data Solution and Benefits • Pulitzer Prize winning investigation resulted in robust coverage of fraud and corruption • PM of Iceland & Pakistan resigned, exposed Putin, Prime Ministers, gangsters, celebrities (Messi) • Led to assassination of journalist in Malta Background • International Consortium of Investigative Journalists (ICIJ), small team of data journalists • International investigative team specializing in cross-border crime, corruption and accountability of power • Works regularly with leaks and large datasets ICIJ Panama Papers INVESTIGATIVE JOURNALISM Fraud Detection / Knowledge Graph29
  31. 31. Bank US Account Person A Company Bank Bahamas AddressNODE RELATIONSHIP Person B
  32. 32. ICIJ Pulitzer Price Winner 2017
  33. 33. Paradise Papers Metadata Model
  34. 34. Business Problem • Find relationships between people, corporations, accounts, shell companies and offshore accounts • Journalists are non-technical • 2017 Leak from Appleby tax sheltering law firm matched 13.4 million account records with public business registrations data from across Caribbean Solution and Benefits • Exposed tax sheltering practices of Apple, Nike • Revealed hidden connections among politicians and nations, like Wilbur Ross & Putin’s son in law • Triggered government tax evasion investigations in US, UK, Europe, India, Australia, Bermuda, Canada and Cayman Islands within 2 days. • Granted $1M endowment from Golden Globes’ HFPA Background • International Consortium of Investigative Journalists (ICIJ), Pulitzer Prize winning journalists • Fourth blockbuster investigation using Neo4j to reveal connections in text-based, and account-based data leaked from offshore law firms and government records about the “1% Elite” • Appends Neo4j-based, “Offshore Leaks Database” ICIJ Paradise Papers INVESTIGATIVE JOURNALISM Fraud Detection / Knowledge Graph34
  35. 35. Neo4j Customer Deep Dive 35
  36. 36. 36 • Record “Cyber Monday” sales • About 35M daily transactions • Each transaction is 3-22 hops • Queries executed in 4ms or less • Replaced IBM Websphere commerce • 300M pricing operations per day • 10x transaction throughput on half the hardware compared to Oracle • Replaced Oracle database • Large postal service with over 500k employees • Neo4j routes 7M+ packages daily at peak, with peaks of 5,000+ routing operations per second. Handling Large Graph Work Loads for Enterprises Real-time promotion recommendations Marriott’s Real-time Pricing Engine Handling Package Routing in Real-Time
  37. 37. 3 Types of Knowledge Graphs Internal knowledge documents & files, with meta data tagging External data source aggregation mapped to entities of interest Context Rich Search External Event Insight Sensing Enterprise NLP Graph technical terms, acronyms, abbreviations, misspellings, etc. Examples: • MDM, Search • Customer support • Document classification Examples: • Supply chain/compliance risk • Market activity aggregation • Sales opportunities Examples: • Improved search • Chatbot implementation • Improved classification Context Independent WarehouseReal-Time WarehouseLogical Warehouse
  38. 38. Background • Social network of 10M graphic artists • Peer-to-peer evaluation of art and works-in-progress • Job sourcing site for creatives • Massive, millions of updates (reads & writes) to Activity Feed • 150 Mongos to 48 Cassandras to 3 Neo4j’s! Business Problem • Artists subscribe, appreciate and curate “galleries” of works of their own and from other artists • Activities Feed is how everyone receives updates • 1st implementation was 150 MongoDB instances • 2nd implementation shrunk to 48 Cassandras, but it was still too slow and required heavy IT overhead Solution and Benefits • 3rd implementation shrunk to 3 Neo4j instances • Saved over $500k in annual AWS fees • Reduced data footprint from 50TB to 40GB • Significantly easier to introduce new features like, “New projects in your Network” Adobe Behance Social Network of 10M Graphic Artists Social Network38 EE Customer since 2016 Q
  39. 39. Background • Personal shopping assistant • Converses with buyer via text, picture and voice to provide real-time recommendations • Combines AI and natural language understanding (NLU) in Neo4j Knowledge Graph • First of many apps in eBay's AI Platform Business Problem • Improve personal context in online shopping • Transform buyer-provided context into ideal purchase recommendations over social platforms • "Feels like talking to a friend" Solution and Benefits • 3 developers, 8M nodes, 20M relationships • Needed high-performance traversals to respond to live customer requests • Easy to train new algorithms and grow model • Generating revenue since launch eBay for Google Assistant ONLINE RETAIL Knowledge Graph powers Real-Time Recommendations39 EE Customer since 2016 Q3
  40. 40. Background • Over 7M citizens suffer from Diabetes • Connecting over 400 researchers • Incorporates over 50 databases, 100k’s of Excel workbooks, 30 database of biological samples • Sought to examine disease from as many angles as possible. Business Problem • Genes are connected by proteins or to metabolites, and patients are connected with their diets, etc… • Needed to improve the utilization of immensely technical data • Needed to cater to doctors and researchers with simple navigation, communication and connections of the graph. Solution and Benefits • Dr. Alexander Jarasch, Head of Bioinformatics and Data Management • Scientists can conduct parallel research without asking the same questions or repeating tests • Built views like a liver sample knowledge graph DZD - German Center for Diabetes Research Medical Genomic Research40 EE Customer since 2016 Q
  41. 41. Questions? Answers! 41
  42. 42. • 9:00 - 9:30 - Breakfast Networking - Welcome • 9:30 - 12:30 - Presentations • Introduction to the Neo4j Graph Platform Rik Van Bruggen, Neo4j • Exploring networks of companies to predict default probabilities Aiko Yamashita, DNB • Building Intelligent Solutions with Graphs Dinuke Abeysekera, Neo4j • 12:30 - Q&A & Networking 42 Our agenda for today
  43. 43. Thank You 43