Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Gain Insights with Graph Analytics

398 views

Published on

3rd in the AskTOM Office Hours series on graph database technologies. https://devgym.oracle.com/pls/apex/dg/office_hours/3084

See the magic of graphs in this session. Graph analysis can answer questions like detecting patterns of fraud or identifying influential customers - and do it quickly and efficiently. We’ll show you the APIs for accessing graphs and running analytics such as finding influencers, communities, anomalies, and how to use them from various languages including Groovy, Python, and Javascript, with Jupiter and Zeppelin notebooks.

Albert Godfrind (EMEA Solutions Architect), Zhe Wu (Architect), and Jean Ihm (Product Manager) walk you through, and take your questions.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Gain Insights with Graph Analytics

  1. 1. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Oracle Ask TOM Office Hours: Gain Insights with Graph Analytics Perform powerful graph analysis using APIs and notebook interfaces. 2018.05.31 Albert Godfrind, Solutions Architect agodfrin Zhe Wu, Architect alanzwu Jean Ihm, Product Manager JeanIhm
  2. 2. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. AskTOM sessions on property graphs • Today’s is the third session on property graphs – February’s session covered “Introduction to Property Graphs” – March's session explained how to model graphs from relational data – In case you missed them, recordings are available at the URL above • Today’s topic: Gain Insights with Graph Analytics • Visit the Spatial and Graph landing page to view recordings; submit feedback, questions, topic requests; view upcoming session dates and topics; sign up 3 https://devgym.oracle.com/pls/apex/dg/office_hours/3084
  3. 3. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. 4 The Story So Far …
  4. 4. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Oracle Big Data Spatial and Graph • Available for Big Data platform/BDCS – Hadoop, HBase, Oracle NoSQL • Supported both on BDA and commodity hardware – CDH and Hortonworks • Database connectivity through Big Data Connectors or Big Data SQL • Included in Big Data Cloud Service Oracle Spatial and Graph • Available with Oracle 18c/12.2/DBCS • Using tables for graph persistence • Graph views on relational data • In-database graph analytics – Sparsification, shortest path, page rank, triangle counting, WCC, sub graphs • SQL queries possible • Included in Database Cloud Service 5 Graph Product Options
  5. 5. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Graph Data Access Layer (DAL) The Architecture … Graph Analytics Blueprints & Lucene/SolrCloud RDF (RDF/XML, N- Triples, N-Quads, TriG,N3,JSON) REST/WebService/Notebooks Java,Groovy,Python,… Java APIs Java APIs/JDBC/SQL/PLSQL Property Graph formats GraphML GML Graph-SON Flat Files 6 Scalable and Persistent Storage Management Parallel In-Memory Graph Analytics/Graph Query (PGX) Oracle NoSQL DatabaseOracle RDBMS Apache HBase Apache Spark
  6. 6. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. • What is a graph? – Data model representing entities as vertices and relationships as edges – Optionally including attributes • Flexible data model – No predefined schema, easily extensible – Particularly useful for sparse data • Enabling new kinds of analytics – Overcoming limitations in relational technology 7 Graph Data Model E A D C B F
  7. 7. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Graph Analysis for Business Insight 8 Identify Influencers Discover Graph Patterns in Big Data Generate Recommendations
  8. 8. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. 9 The APIs for Graph Analytics
  9. 9. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Graph Data Access Layer (DAL) The Architecture … Graph Analytics Blueprints & Lucene/SolrCloud RDF (RDF/XML, N- Triples, N-Quads, TriG,N3,JSON) REST/WebService/Notebooks Java,Groovy,Python,… Java APIs Java APIs/JDBC/SQL/PLSQL Property Graph formats GraphML GML Graph-SON Flat Files 10 Scalable and Persistent Storage Management Parallel In-Memory Graph Analytics/Graph Query (PGX) Oracle NoSQL DatabaseOracle RDBMS Apache HBase Apache Spark What we’ll focus on now.
  10. 10. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. 11
  11. 11. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Using the PGX Groovy Shell • Shell script does all the setup • Variables instance, session and analyst are pre-configured 12 $ cd /opt/oracle/oracle-spatial-graph/property_graph/pgx/bin $ cd $ORACLE_HOME/md/property_graph/pgx/bin $ ./pgx PGX Shell 2.4.0 type :help for available commands variables instance, session and analyst ready to use pgx>
  12. 12. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Invoking the Analyst Functions 1. Get an In-Memory Analyst : 2. Read the graph from data store into memory : 3. Perform analytical functions : 13 session = Pgx.createSession("session_ID_1"); analyst = session.createAnalyst(); pg = session.readGraphWithProperties(...); analyst.countTriangles(...) analyst.shortestPathDijkstra(...) analyst.pageRank(...) analyst.wcc(...)
  13. 13. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. 14 Catch that Zeppelin!
  14. 14. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. 15 URL of the PGX server
  15. 15. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. 16
  16. 16. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. 17
  17. 17. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. 18
  18. 18. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. 19
  19. 19. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Setup the Graph Configuration cfb = GraphConfigBuilder.forPropertyGraphHbase(); cfb.setZkQuorum("bigdatalite").setZkClientPort(2181); cfb.setZkSessionTimeout(120000) cfb.setInitialEdgeNumRegions(3); cfb.setInitialVertexNumRegions(3).setSplitsPerRegion(1); cfb.setName("connections"); cfb = GraphConfigBuilder.forPropertyGraphNosql(); cfb.setHosts(["bigdatalite:5000"]); cfb.setStoreName("kvstore"); cfb.setMaxNumConnections(2); cfb.setName("connections"); Apache HBase Oracle NoSQL cfb = GraphConfigBuilder.forPropertyGraphRdbms(); cfb.setJdbcUrl("jdbc:oracle:thin:@127.0.0.1:1521:orcl122"); cfb.setUsername("scott").setPassword("tiger"); cfb.setName("connections"); Oracle RDBMS
  20. 20. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. More Setup • Select the properties to load ! • We need the edge labels • Build the configuration 21 cfb.addVertexProperty("name", PropertyType.STRING); cfb.addVertexProperty("role", PropertyType.STRING); cfb.addVertexProperty("occupation", PropertyType.STRING); cfb.addVertexProperty("country", PropertyType.STRING); cfb.addVertexProperty("religion", PropertyType.STRING); cfb.addEdgeProperty("weight", PropertyType.DOUBLE, "1"); cfb.hasEdgeLabel(true).setLoadEdgeLabel(true); cfg = cfb.build();
  21. 21. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Loading the Graph in Memory • Load the graph using a configuration • Can also use a JSON configuration file • Build the configuration from a JSON file 22 pg = session.readGraphWithProperties(cfg); pg = session.readGraphWithProperties("connections_config.json"); cfg = GraphConfigFactory.forAnyFormat().fromPath("connections_config.json"); pg = session.readGraphWithProperties(cfg);
  22. 22. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Example configuration file for Oracle RDBMS 23 { "format": "pg", "db_engine": "RDBMS", "jdbc_url":"jdbc:oracle:thin:@127.0.0.1:1521:orcl122", "username":"scott", "password":"tiger", "max_num_connections":8, "name": "connections", "vertex_props": [ {"name":"name", "type":"string"}, {"name":"role", "type":"string"}, {"name":"occupation", "type":"string"}, {"name":"country", "type":"string"}, {"name":"political", "type":"string"}, {"name":"religion", "type":"string"} ], "edge_props": [ {"name":"weight", "type":"double", "default":"1"} ], "edge_label":true, "loading": {"load_edge_label": true} }
  23. 23. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Explore the In-Memory Graph • How many vertices ? • How many edges ? • Fetch one vertex by id • What properties do edges have ? • Fetch one edge by id 24 pgx> pg.getNumEdges() ==> 164 pgx> pg.getNumVertices() ==> 80 pgx> pg.getVertex(1l) ==> PgxVertex[ID=1] pgx> pg.getEdge(0l) ==> PgxEdge[ID=0]
  24. 24. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Explore the Graph • What properties do vertices have ? • What properties do edges have ? 25 pgx> pg.getVertexProperties() ==> VertexProperty[name=role,type=string,graph=connections] ==> VertexProperty[name=occupation,type=string,graph=connections] ==> VertexProperty[name=political,type=string,graph=connections] ==> VertexProperty[name=name,type=string,graph=connections] ==> VertexProperty[name=religion,type=string,graph=connections] ==> VertexProperty[name=country,type=string,graph=connections] pgx> pg.getEdgeProperties() ==> EdgeProperty[name=weight,type=double,graph=connections]
  25. 25. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Explore the Graph • Get all vertices • Get all edges 26 pgx> pg.getVertices() ==> PgxVertex[ID=1] ==> PgxVertex[ID=2] ==> PgxVertex[ID=3] ... pgx> pg.getEdges() ==> PgxEdge[ID=0] ==> PgxEdge[ID=1] ==> PgxEdge[ID=2] ...
  26. 26. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Page Rank • Get an approximate Pagerank • Show the top influencers 27 pgx> r=analyst.pagerank(graph:pg, max:1000, variant:'APPROXIMATE'); ==> VertexProperty[name=approx_pagerank,type=double,graph=connections] pgx> rank.getTopKValues(3) ==> PgxVertex[ID=1]=0.0608868998919989 ==> PgxVertex[ID=60]=0.03445628038301776 ==> PgxVertex[ID=42]=0.027831790283775117 pgx> it = r.getTopKValues(3).iterator(); pgx> while(it.hasNext()) { pgx> v=it.next(); pgx> id=v.getKey().getId(); pgx> name=pg.getVertex(id).getProperty("name"); pgx> pr=v.getValue(); pgx> System.out.println(id+" "+name+" "+pr); pgx> } 1 Barack Obama 0.0608868998919989 60 Nicolas Maduro 0.03445628038301776 42 NBC 0.027831790283775117
  27. 27. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Betweenness Centrality 28 pgx> b = analyst.vertexBetweennessCentrality(pg).getTopKValues(5); ==> PgxVertex[ID=1]=2225.6666666666665 ==> PgxVertex[ID=2]=1029.0 ==> PgxVertex[ID=3]=876.5 ==> PgxVertex[ID=37]=797.0 ==> PgxVertex[ID=45]=456.0 pgx> it=b.iterator(); pgx> while(it.hasNext()) { pgx> v=it.next(); pgx> id=v.getKey().getId(); pgx> name=pg.getVertex(id).getProperty("name"); pgx> System.out.println(id+" "+name); pgx> } 1 Barack Obama 2225.6666666666665 2 Beyonce 1029.0 3 Charlie Rose 876.5 37 Amazon 797.0 45 CBS 456.0
  28. 28. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Shortest Path on a Weighted graph • Get origin and destination vertices • Compute the shortest path • Is there a path ? • What is the total weight of the path? 29 pgx> path = analyst.shortestPathDijkstra(pg, s, d, pg.getEdgeProperty("weight")); ==> PgxPath[graph=connections,exists=true] pgx> path.exists(); ==> true pgx> path.getPathLengthWithCost(); ==> 3.0 pgx> s = pg.getVertices(new VertexFilter("vertex.name = 'Barack Obama'"))[0]; ==> PgxVertex[ID=1] pgx> d = pg.getVertices(new VertexFilter("vertex.name = 'Benedict Cumberbatch'"))[0]; ==> PgxVertex[ID=53]
  29. 29. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. PGQL • Find all instances of a given pattern/template in data graph • Fast, scaleable query mechanism SELECT v3.name, v3.age FROM ‘myGraph’ WHERE (v1:Person WITH name = ‘Amber’) –[:friendOf]-> (v2:Person) –[:knows]-> (v3:Person) query Query: Find all people who are known to friends of ‘Amber’. data graph ‘myGraph’ :Person{100} name = ‘Amber’ age = 25 :Person{200} name = ‘Paul’ age = 30 :Person{300} name = ‘Heather’ age = 27 :Company{777} name = ‘Oracle’ location = ‘Redwood City’ :worksAt{1831} startDate = ’09/01/2015’ :friendOf{1173} :knows{2200} :friendOf {2513} since = ’08/01/2014’ 30
  30. 30. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Executing PGQL 31 pgx> pg.queryPgql(" pgx> SELECT x.name, y.name pgx> WHERE (x) -[:leads]-> (y) pgx> ORDER BY x.name, y.name pgx> ").print() +--------------------------------------------+ | x.name | y.name | +--------------------------------------------+ | "Bobby Murphy" | "Snapchat" | | "Ertharin Cousin" | "World Food Programme" | | "Evan Spiegel" | "Snapchat" | | "Google" | "Nest" | | "Jack Ma" | "Alibaba" | | "Jeff Bezos" | "Amazon" | | "Pony Ma" | "Tencent" | | "Pope Francis" | "The Vatican" | | "Tony Fadell" | "Nest" | +--------------------------------------------+
  31. 31. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle 32 Pre-Built Analytics
  32. 32. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Graph Algorithms and their Applications • Page rank, Weighted page rank – find influencers, critical vertices • Personalized page rank – find important people/products with respect to a given starting point • Sparsification – trim down the graph to make it more fragmented • Clustering – find communities which can be the basis of segmentation, and/or recommendation/anomaly detection, churn analysis 33
  33. 33. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Graph Algorithms and their Applications • BFS – impact analysis, link analysis • Shortest path – discover links, find suspect’s close collaborators, transportation routing • Matrix factorization – recommendation • Centrality (in-degree, out-degree, between-ness, closeness) – find critical people, devices, router 34
  34. 34. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Graph Algorithms and their Applications • Salsa – recommendation • Graph Filtering – segmentation • Graph Query/Pattern matching – find anomaly, detect fraud, discover correlation, recommend by popularity, segmentation • Text search – find similarity, fuzzy ranking, relevancy ranking, recommendation, – GeoSpatial search/filtering, sentiment analysis, faceted query 35
  35. 35. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. 36 Graph Analytics in practice
  36. 36. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Build Recommender System with Graph Technologies
  37. 37. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. • Environment – Oracle Big Data Lite VM – Oracle Big Data Spatial and Graph – Apache SolrCloud • A “user-item” property graph – Vertices (items, descriptions, and users) – Edges (linking users and items) Recommendation: you may also like Building a Recommender System
  38. 38. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Building a Recommender System Multiple approaches - and they can be mixed together Collaborative filtering • People liked similar items in the past will like similar items in the future Content-based filtering • Match item description • Match user profile • Relevancy ranking Personalized Page Ranking • Randomly navigate from a user to a product, then back to a user, … • Randomly jump to starting point(s) • A  u • u  B • B  w • w  C … A B C u v w x
  39. 39. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Personalized Page Rank-based Recommender System Random walk with restart Reference: https://blogs.oracle.com/bigdataspatialgraph/entry/intuitive_explanation_of_personalized_page
  40. 40. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Key API for Personalized Page Rank • API: ppr=analyst.personalizedPagerank ( pgxGraph, vertexSet, 0.0001 /*max error*/, 0.85 /*damping factor*/, 1000 ); • Result: ppr.getTopKValues() it=ppr.getTopKValues(9).iterator(); while (it.hasNext()) { entry=it.next(); vid=entry.getKey().getId(); System.out.format("ppr=%.4f vertex=%sn", entry.getValue(), opg.getVertex(vid)); } ppr=0.2496 vertex=Vertex ID 1 {name:str:John, age:int:10} ppr=0.1758 vertex=Vertex ID 11 {type:str:Prod, desc:str:Kindle Fire} ppr=0.1758 vertex=Vertex ID 10 {type:str:Prod, desc:str:iPhone5, …} 41
  41. 41. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. • Matrix factorization • Graph intuition Recommendation with C.F. A customer’s taste signature is defined by what he/she likes Customer Item An item’s taste signature is (recursively) defined by who likes it A recursive graph algorithm solves taste signature of both customers and items [0.758 0.331 0.124 …] [0.328 0.172 0.519 ….] [0.231 0.119 0.033 ….] [0.305 0.888 0.931 ….] [0.758 0.331 0.124 ….] [0.391 0.551 0.223 …] [0.112 0.237 0.456 …]
  42. 42. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Demo Build Recommender System with Graph Technologies
  43. 43. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Circular Payment Fraud Detection
  44. 44. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Circular Payment Fraud Detection • Business requirement: detect n-hop circular payment, a particular fraud pattern, in real-world transaction data. • Input: Transaction Data(from a Tax dept) in CSV • Output: identify circular payment in the form of A => B => C => … => A 45
  45. 45. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Payment Data in Property Graph Format (.opv & .ope) 46 223855785052710,original_id,1,223855785052710,, 223855785052710,xzqh_dm_shi,1,223855,, 345337155048029,original_id,1,345337155048029,, 345337155048029,hymc,1,批发业,, 345337155048029,xzqh_dm_shi,1,345355,, 398238126805852918,original_id,1,19545407609453X,, 398238126805852918,xzqh_dm_shi,1,195455,, 345337078144545,original_id,1,345337078144545,, 345337078144545,hymc,1,装卸搬运和运输代理业,, 345337078144545,xzqh_dm_shi,1,345355,, 345337984038796,original_id,1,345337984038796,, 345337984038796,hymc,1,装卸搬运和运输代理业,, 345337984038796,xzqh_dm_shi,1,345355,, 167,194053906467083,345343155038506,bought_from,recorded_by,1,000,, 167,194053906467083,345343155038506,bought_from,expiration,1,,, 342,345358506939459,345356155269539,bought_from,recorded_by,1,263,, 342,345358506939459,345356155269539,bought_from,expiration,1,,,
  46. 46. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Circular Payment Detection Pipeline • Major steps involved – Convert source transaction data into Oracle flat files – Execute PGQL – Run built-in analytics – Visualize 47
  47. 47. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Circular Payment Detection Pipeline • Analytics and PGQL implementation details – PGQL select m, n where (m)-[e1]->(n)-[e2]->(m) select m, n, o where (m)-[e1]->(n)-[e2]->(o)-[e3]->(m) select m, n, o, p where (m)-[e1]->(n)-[e2]->(o)-[e3]->(p)-[e4]->(m) … – Add all_different( m, n, o, p) if needed – Use <- or just –[e]- if needed 48
  48. 48. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Circular Payment Detection Pipeline • Analytics and PGQL implementation details – Run built-in analytics analyst.sccKosaraju(pgxGraph) 49 https://en.wikipedia.org/wiki/Kosaraju%27s_algorithm
  49. 49. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Circular Payment Detection Pipeline 50
  50. 50. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Circular Payment Detection Pipeline 51
  51. 51. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Circular Payment Visualization with Cytoscape 52
  52. 52. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Demo Circular Payment Fraud Detection
  53. 53. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. 54 Resources
  54. 54. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Resources • Oracle Spatial and Graph & Big Data Spatial and Graph on OTN oracle.com/technetwork/database/options/spatialandgraph oracle.com/technetwork/database/database-technologies/bigdata-spatialandgraph – White papers, software downloads, documentation and videos • Blogs – examples, tips & tricks blogs.oracle.com/oraclespatial | blogs.oracle.com/bigdataspatialgraph • Property Graphs 101: How to Get Started with Property Graphs on the Oracle Database – Arthur Dayton, Vlamis Software https://youtu.be/QSj0zOjOAWI • YouTube channel: https://www.youtube.com/channel/UCZqBavfLlCuS0il6zNY696w • Oracle Big Data Lite Virtual Machine - a free sandbox to get started www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html – Hands On Lab included in /opt/oracle/oracle-spatial-graph/ or http://github.com/oracle/BigDataLite/ 55
  55. 55. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. 56 Tutorials https://docs.oracle.com/cd/E56133_01/latest/tutorials/index.html
  56. 56. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Resources – social media and online communities • Follow the product team: @SpatialHannes, @JeanIhm, @agodfrin, @alanzwu • Oracle Spatial and Graph SIG user groups (search “Oracle Spatial and Graph Community”) 57
  57. 57. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. AskTOM sessions on property graphs • Next Spatial and Graph session July 17 – Topic: Graph Visualization • View recordings, submit feedback, questions, topic requests, view upcoming session dates and topics, sign up to get regular updates 58 https://devgym.oracle.com/pls/apex/dg/office_hours/3084
  58. 58. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. 59 Thanks for attending! See you next month. https://devgym.oracle.com/pls/apex/dg/office_hours/3084

×