Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ADV Slides: Graph Databases on the Edge

441 views

Published on

Graph databases may be the unsung heroes of data platforms. They are poised to expand dramatically in the next few years as the nature of important analytics data expands dramatically into understanding. We live and work today in a highly connected world where individuals and their relationships organize perceptions, consumer behaviors, and many other business success factors. Where patterns are involved in relationships, it is imperative to understand them. Graph databases are the technology that is best-suited to determining and understanding data relationships.

This code-lite session is a primer on graph databases and the relationship data stored in them for the analytics architect in the enterprise. It will help you determine why, how, and where to apply graphs, and how to get started.

Published in: Data & Analytics
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

ADV Slides: Graph Databases on the Edge

  1. 1. Graph Databases on the Edge Presented by: William McKnight “#1 Global Influencer in Data Warehousing” Onalytica President, McKnight Consulting Group An Inc. 5000 Company in 2018 and 2017 @williammcknight www.mcknightcg.com (214) 514-1444 Second Thursday of Every Month, at 2:00 ET
  2. 2. • • • • • •
  3. 3. AnzoGraph DB Triplestore with labeled properties Built for diverse data harmonization and analytics at scale (trillions of triples & more Graph analytics like page rank and shortest path. BI-style analytics like graph views, named queries, aggregates, built-in data science functions Inferencing and ontology native support
  4. 4. XXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXXX KEY VALUE XXXXX Order Prod uct Product XXXXX XXXXX XXXXX Location 1 2
  5. 5. 3
  6. 6. 4
  7. 7. Used_with R eceive_paym ent Sets_Up Used_with Used_with Involved in Prior Fraud Cases 5
  8. 8. Graph Databases on the Edge Presented by: William McKnight “#1 Global Influencer in Data Warehousing” Onalytica President, McKnight Consulting Group An Inc. 5000 Company in 2018 and 2017 @williammcknight www.mcknightcg.com (214) 514-1444 Second Thursday of Every Month, at 2:00 ET
  9. 9. Customers Achieve Sustainable Competitive Advantage By Adopting Graph Databases New Products & Services Leveraging Data Relationships • First to market, up and running in days, not weeks or months • Reduced churn, increasing engagement and uncovering fraud • Achieved new company vision centered around Business Graph • Leapfrogged the competition with a 360 degree view of the customer Reimagine Existing Applications, and Innovate with Data Relationships • Kept the business running when data growth threatened to stop it • Drastically reduced project complexity and risk • Increased revenue and delighted customers by improving user experience • Brought new offering to market to compete with Amazon Prime & Fresh, and Google Express 2
  10. 10. Graph Growth Ahead “The application of graph processing and graph DBMSs will grow at 100 percent annually through 2022 to continuously accelerate data preparation and enable more complex and adaptive data science.” “Graph analytics will grow in the next few years due to the need to ask complex questions across complex data, which is not always practical or even possible at scale using SQL queries.” source - February 2019 press release by Gartner - https://www.gartner.com/en/newsroom/press-releases/2019-02-18-gartner- identifies-top-10-data-and-analytics-technolo 3
  11. 11. What Can Be Vertices? • Things – Bank accounts – Customer accounts • Mobile phones – Products – Trading networks, auctions – Water, power, gas grids – Disease, drugs, molecules • Interactions, transmission – Insurance policies – Machines, servers, URLs – Sensor networks 4 • People – Customers, families – Employees – Affinity groups, clubs • Politics, causes, doctors • Professionals (LinkedIn) – Companies, institutions • Places – Map locations • Cities, landmarks – Retail stores – Houses or buildings – Communication networks – Transportation hubs • Airports, shipping lanes, etc.
  12. 12. What Can be Edges? • People – Relationships – Ideas, preferences – Email, phone calls, SMS, IM – Collaborations • Places – Roads, routes, railways – Water, power, gas, pipelines, telephone lines – Anything with GPS coordinates • Things – Events – Money, transactions – Purchases – Pressure, temperature – Diseases – Contraband – URLs – Phone calls – Citations – Weights, scores – Timestamps 5
  13. 13. Social Network “path exists” Performance • Experiment: • 1000 persons • Average 50 friends per person • pathExists(a,b) limited to depth 4 # persons query time Relational database 1000 2000ms Graph db 1000 2ms Graph db 1000000 2ms
  14. 14. Excessive relationships Healthcare Fraud • Monitor drugs and treatments – Excessive prescribers – Excessive consumers • Patients connected to – Doctors, pharmacies • Use Graph Access – Find outliers and investigate – Find X actual frauds 7
  15. 15. Relational DBs Can’t Handle Data Relationships Well • Cannot model or store data and relationships without complexity • Performance degrades with number and levels of relationships, and database size • Query complexity grows with need for JOINs • Adding new types of data and relationships requires schema redesign, increasing time to market 8 Slow development Poor performance Low scalability Hard to maintain … making traditional databases inappropriate when data relationships are valuable in real-time
  16. 16. Discrete Data Minimally connected data Graph Databases are designed for data relationships Use the Right Database for the Right Job Other NoSQL Relational DBMS Graph DB Connected Data Focused on Data Relationships Development Benefits Model maintenance Deployment Benefits Performance Minimal resource usage
  17. 17. Graph Visualization 10
  18. 18. Graph Algorithms
  19. 19. PageRank 12 Page A 1.0 Page C 1.0 Page B 1.0 Page D 1.0 1*0.85/2 1*0.85/2 1*0.85 1*0.85 1*0.85 Sum of inputs + 0.15 http://www.whitelines.nl/html/google-page-rank.html see spreadsheet http://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm
  20. 20. +0.150 page D +0.850 page B +0.850 page A +0.425 C Total 2.275 PageRank: After 1st Results Page A 1.0 Page C 2.275 Page B 0.575 Page D 0.15 +0.150 page A +0.425 B Total 0.575 +0.15 Page C +0.85 A Total 1.00 +0.150 D Total 0.150 1*0.85/2 1*0.85/2 1*0.85 1*0.85 1*0.85 http://www.whitelines.nl/html/google-page-rank.html (see spreadsheet) 13
  21. 21. Page Rank Iterations 14 End of iteration A result B result C result D result 1 1.000 0.575 2.275 0.150 2 2.084 0.575 1.191 0.150 3 1.163 1.036 1.652 0.150 4 1.554 0.644 1.652 0.150 5 1.554 0.810 1.485 0.150 6 1.413 0.810 1.627 0.150 7 1.533 0.750 1.567 0.150 8 1.482 0.801 1.567 0.150 9 1.482 0.780 1.588 0.150 10 1.500 0.780 1.570 0.150 11 1.485 0.788 1.578 0.150 12 1.491 0.781 1.578 0.150 13 1.491 0.784 1.575 0.150 14 1.489 0.784 1.577 0.150 15 1.491 0.783 1.576 0.150 16 1.490 0.784 1.576 0.150 17 1.490 0.783 1.577 0.150 18 1.490 0.783 1.576 0.150 19 1.490 0.783 1.577 0.150 20 1.490 0.783 1.577 0.150
  22. 22. PageRank: 20 Iterations Until Convergence Page A 1.49 Page C 1.58 Page B 0.78 Page D 0.15 Most important web page Page C increases page A importance 15
  23. 23. Betweenness • Find bridges across different communities • High score = edge links different communities Bridge vertex Bridge vertex 16
  24. 24. Closeness • The shortest paths between any two vertices 17
  25. 25. Eigen Centrality • Measures the importance of a vertex by the importance of its neighbors importantimportant important must be important 18
  26. 26. Clustering Coefficient: Cascading Churn 19 If two people churn, what is the likelihood others will? The two churners affect the central influencer Finally: All contacts churn. Individual-focused model underestimates churn by 6X. SELECT * FROM LocalClusteringCoefficient( ON Calls as edges PARTITION BY caller_from ON caller_from as vertices PARTITION BY caller_id targetKey(caller_to') directed('f') degreeRange('[3:]') accumulate('personId') );
  27. 27. Loopy Belief Propagation • Loopy belief works by peer-pressure – Node X gets a final belief value by listening to its neighbors – Nodes with known values propagate through the graph • Adjacent nodes send message saying “update your beliefs” – Based on priors, conditional probabilities, and evidence • Keep passing messages until a stable belief state is reached See https://www.ics.uci.edu/~welling/teaching/ICS279/GBP-vision.pdf 20
  28. 28. Great Questions for Graph Databases • In what order did a specific set of related events happen? • Are there patterns of events in our data that seem to be related by time? • How far apart in a (social or physical) network are two “actors” and how strong is their relationship? • What are the identifiable social groups and what are the general patterns of such groups? • How important is any given “actor” in any given network and event? • What type of messages emanate from a specific area? 21
  29. 29. How to Identify a Graph Workload • Workload is identified by “network, hierarchy, tree, ancestry, structure” words • You are planning to use relational performance tricks • Your queries will be about pathing • You are limiting queries by their complexity • You are looking for “non-obvious” patterns in the data 22
  30. 30. Graph Modeling
  31. 31. The Domain Model 24
  32. 32. Actions Model actions depending on what you want as vertices (Bill)-[:SENT]->(email)-[:TO]->(Jim) OR (Bill)-[:EMAILED]->(Jim) 25
  33. 33. Semantic Graphs • Subject: John R Peterson Predicate: Knows Object: Frank T Smith • Subject: Triple #1 Predicate: Confidence Percent Object: 70 • Subject: Triple #1 Predicate: Provenance Object: Mary L Jones 26
  34. 34. Triple Store • A triple is a data entity composed of subject-predicate-object – "Bob is 35” – "Bob knows Fred” – “William likes running” 27
  35. 35. Conclusion • Graph is a Fast Growing data category • It’s all about the Use Case; Good for Graph: – Real-time recommendations – Fraud detection – Network and IT operations – Identity and access management – Graph-based search – Identifying relative importance • Reimagine your data as a graph – The whiteboard model is the physical model • Remember Page Rank 28
  36. 36. Graph Databases on the Edge Presented by: William McKnight “#1 Global Influencer in Data Warehousing” OnAlytica President, McKnight Consulting Group An Inc. 5000 Company in 2018 and 2017 @williammcknight www.mcknightcg.com (214) 514-1444 Second Thursday of Every Month, at 2:00 ET

×