More Related Content Similar to Gain Insights with Graph Analytics (20) Gain Insights with Graph Analytics 1. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Oracle Ask TOM Office Hours:
Gain Insights with Graph Analytics
Perform powerful graph analysis using APIs and notebook interfaces.
2018.05.31
Albert Godfrind, Solutions Architect agodfrin
Zhe Wu, Architect alanzwu
Jean Ihm, Product Manager JeanIhm
2. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
AskTOM sessions on property graphs
• Today’s is the third session on property graphs
– February’s session covered “Introduction to Property Graphs”
– March's session explained how to model graphs from relational data
– In case you missed them, recordings are available at the URL above
• Today’s topic: Gain Insights with Graph Analytics
• Visit the Spatial and Graph landing page to view recordings; submit
feedback, questions, topic requests; view upcoming session dates and
topics; sign up
3
https://devgym.oracle.com/pls/apex/dg/office_hours/3084
3. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
4
The Story So Far …
4. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Oracle Big Data Spatial and Graph
• Available for Big Data platform/BDCS
– Hadoop, HBase, Oracle NoSQL
• Supported both on BDA and commodity
hardware
– CDH and Hortonworks
• Database connectivity through Big Data
Connectors or Big Data SQL
• Included in Big Data Cloud Service
Oracle Spatial and Graph
• Available with Oracle 18c/12.2/DBCS
• Using tables for graph persistence
• Graph views on relational data
• In-database graph analytics
– Sparsification, shortest path, page rank, triangle
counting, WCC, sub graphs
• SQL queries possible
• Included in Database Cloud Service
5
Graph Product Options
5. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Graph Data Access Layer (DAL)
The Architecture …
Graph Analytics
Blueprints & Lucene/SolrCloud RDF (RDF/XML, N-
Triples, N-Quads,
TriG,N3,JSON)
REST/WebService/Notebooks
Java,Groovy,Python,…
Java APIs
Java APIs/JDBC/SQL/PLSQL
Property Graph
formats
GraphML
GML
Graph-SON
Flat Files
6
Scalable and Persistent Storage Management
Parallel In-Memory Graph
Analytics/Graph Query (PGX)
Oracle NoSQL DatabaseOracle RDBMS Apache HBase
Apache
Spark
6. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
• What is a graph?
– Data model representing entities as
vertices and relationships as edges
– Optionally including attributes
• Flexible data model
– No predefined schema, easily extensible
– Particularly useful for sparse data
• Enabling new kinds of analytics
– Overcoming limitations in relational
technology
7
Graph Data Model
E
A D
C B
F
7. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Graph Analysis for Business Insight
8
Identify
Influencers
Discover Graph Patterns
in Big Data
Generate
Recommendations
8. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
9
The APIs for Graph Analytics
9. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Graph Data Access Layer (DAL)
The Architecture …
Graph Analytics
Blueprints & Lucene/SolrCloud RDF (RDF/XML, N-
Triples, N-Quads,
TriG,N3,JSON)
REST/WebService/Notebooks
Java,Groovy,Python,…
Java APIs
Java APIs/JDBC/SQL/PLSQL
Property Graph
formats
GraphML
GML
Graph-SON
Flat Files
10
Scalable and Persistent Storage Management
Parallel In-Memory Graph
Analytics/Graph Query (PGX)
Oracle NoSQL DatabaseOracle RDBMS Apache HBase
Apache
Spark
What we’ll focus on
now.
11. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Using the PGX Groovy Shell
• Shell script does all the setup
• Variables instance, session and analyst are pre-configured
12
$ cd /opt/oracle/oracle-spatial-graph/property_graph/pgx/bin
$ cd $ORACLE_HOME/md/property_graph/pgx/bin
$ ./pgx
PGX Shell 2.4.0
type :help for available commands
variables instance, session and analyst ready to use
pgx>
12. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Invoking the Analyst Functions
1. Get an In-Memory Analyst :
2. Read the graph from data store into memory :
3. Perform analytical functions :
13
session = Pgx.createSession("session_ID_1");
analyst = session.createAnalyst();
pg = session.readGraphWithProperties(...);
analyst.countTriangles(...)
analyst.shortestPathDijkstra(...)
analyst.pageRank(...)
analyst.wcc(...)
13. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. 14
Catch that Zeppelin!
14. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. 15
URL of the PGX server
19. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Setup the Graph Configuration
cfb = GraphConfigBuilder.forPropertyGraphHbase();
cfb.setZkQuorum("bigdatalite").setZkClientPort(2181);
cfb.setZkSessionTimeout(120000)
cfb.setInitialEdgeNumRegions(3);
cfb.setInitialVertexNumRegions(3).setSplitsPerRegion(1);
cfb.setName("connections");
cfb = GraphConfigBuilder.forPropertyGraphNosql();
cfb.setHosts(["bigdatalite:5000"]);
cfb.setStoreName("kvstore");
cfb.setMaxNumConnections(2);
cfb.setName("connections");
Apache HBase
Oracle NoSQL
cfb = GraphConfigBuilder.forPropertyGraphRdbms();
cfb.setJdbcUrl("jdbc:oracle:thin:@127.0.0.1:1521:orcl122");
cfb.setUsername("scott").setPassword("tiger");
cfb.setName("connections");
Oracle RDBMS
20. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
More Setup
• Select the properties to load !
• We need the edge labels
• Build the configuration
21
cfb.addVertexProperty("name", PropertyType.STRING);
cfb.addVertexProperty("role", PropertyType.STRING);
cfb.addVertexProperty("occupation", PropertyType.STRING);
cfb.addVertexProperty("country", PropertyType.STRING);
cfb.addVertexProperty("religion", PropertyType.STRING);
cfb.addEdgeProperty("weight", PropertyType.DOUBLE, "1");
cfb.hasEdgeLabel(true).setLoadEdgeLabel(true);
cfg = cfb.build();
21. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Loading the Graph in Memory
• Load the graph using a configuration
• Can also use a JSON configuration file
• Build the configuration from a JSON file
22
pg = session.readGraphWithProperties(cfg);
pg = session.readGraphWithProperties("connections_config.json");
cfg = GraphConfigFactory.forAnyFormat().fromPath("connections_config.json");
pg = session.readGraphWithProperties(cfg);
22. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Example configuration file for Oracle RDBMS
23
{
"format": "pg", "db_engine": "RDBMS",
"jdbc_url":"jdbc:oracle:thin:@127.0.0.1:1521:orcl122",
"username":"scott", "password":"tiger", "max_num_connections":8,
"name": "connections",
"vertex_props": [
{"name":"name", "type":"string"},
{"name":"role", "type":"string"},
{"name":"occupation", "type":"string"},
{"name":"country", "type":"string"},
{"name":"political", "type":"string"},
{"name":"religion", "type":"string"}
],
"edge_props": [
{"name":"weight", "type":"double", "default":"1"}
],
"edge_label":true,
"loading": {"load_edge_label": true}
}
23. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Explore the In-Memory Graph
• How many vertices ?
• How many edges ?
• Fetch one vertex by id
• What properties do edges have ?
• Fetch one edge by id
24
pgx> pg.getNumEdges()
==> 164
pgx> pg.getNumVertices()
==> 80
pgx> pg.getVertex(1l)
==> PgxVertex[ID=1]
pgx> pg.getEdge(0l)
==> PgxEdge[ID=0]
24. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Explore the Graph
• What properties do vertices have ?
• What properties do edges have ?
25
pgx> pg.getVertexProperties()
==> VertexProperty[name=role,type=string,graph=connections]
==> VertexProperty[name=occupation,type=string,graph=connections]
==> VertexProperty[name=political,type=string,graph=connections]
==> VertexProperty[name=name,type=string,graph=connections]
==> VertexProperty[name=religion,type=string,graph=connections]
==> VertexProperty[name=country,type=string,graph=connections]
pgx> pg.getEdgeProperties()
==> EdgeProperty[name=weight,type=double,graph=connections]
25. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Explore the Graph
• Get all vertices
• Get all edges
26
pgx> pg.getVertices()
==> PgxVertex[ID=1]
==> PgxVertex[ID=2]
==> PgxVertex[ID=3]
...
pgx> pg.getEdges()
==> PgxEdge[ID=0]
==> PgxEdge[ID=1]
==> PgxEdge[ID=2]
...
26. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Page Rank
• Get an approximate Pagerank
• Show the top influencers
27
pgx> r=analyst.pagerank(graph:pg, max:1000, variant:'APPROXIMATE');
==> VertexProperty[name=approx_pagerank,type=double,graph=connections]
pgx> rank.getTopKValues(3)
==> PgxVertex[ID=1]=0.0608868998919989
==> PgxVertex[ID=60]=0.03445628038301776
==> PgxVertex[ID=42]=0.027831790283775117
pgx> it = r.getTopKValues(3).iterator();
pgx> while(it.hasNext()) {
pgx> v=it.next();
pgx> id=v.getKey().getId();
pgx> name=pg.getVertex(id).getProperty("name");
pgx> pr=v.getValue();
pgx> System.out.println(id+" "+name+" "+pr);
pgx> }
1 Barack Obama 0.0608868998919989
60 Nicolas Maduro 0.03445628038301776
42 NBC 0.027831790283775117
27. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Betweenness Centrality
28
pgx> b = analyst.vertexBetweennessCentrality(pg).getTopKValues(5);
==> PgxVertex[ID=1]=2225.6666666666665
==> PgxVertex[ID=2]=1029.0
==> PgxVertex[ID=3]=876.5
==> PgxVertex[ID=37]=797.0
==> PgxVertex[ID=45]=456.0
pgx> it=b.iterator();
pgx> while(it.hasNext()) {
pgx> v=it.next();
pgx> id=v.getKey().getId();
pgx> name=pg.getVertex(id).getProperty("name");
pgx> System.out.println(id+" "+name);
pgx> }
1 Barack Obama 2225.6666666666665
2 Beyonce 1029.0
3 Charlie Rose 876.5
37 Amazon 797.0
45 CBS 456.0
28. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Shortest Path on a Weighted graph
• Get origin and destination vertices
• Compute the shortest path
• Is there a path ?
• What is the total weight of the path?
29
pgx> path = analyst.shortestPathDijkstra(pg, s, d, pg.getEdgeProperty("weight"));
==> PgxPath[graph=connections,exists=true]
pgx> path.exists();
==> true
pgx> path.getPathLengthWithCost();
==> 3.0
pgx> s = pg.getVertices(new VertexFilter("vertex.name = 'Barack Obama'"))[0];
==> PgxVertex[ID=1]
pgx> d = pg.getVertices(new VertexFilter("vertex.name = 'Benedict Cumberbatch'"))[0];
==> PgxVertex[ID=53]
29. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
PGQL
• Find all instances of a given pattern/template in data graph
• Fast, scaleable query mechanism
SELECT v3.name, v3.age
FROM ‘myGraph’
WHERE
(v1:Person WITH name = ‘Amber’) –[:friendOf]-> (v2:Person) –[:knows]-> (v3:Person)
query
Query: Find all people who are
known to friends of ‘Amber’.
data graph
‘myGraph’
:Person{100}
name = ‘Amber’
age = 25
:Person{200}
name = ‘Paul’
age = 30
:Person{300}
name = ‘Heather’
age = 27
:Company{777}
name = ‘Oracle’
location =
‘Redwood City’
:worksAt{1831}
startDate = ’09/01/2015’
:friendOf{1173}
:knows{2200}
:friendOf {2513}
since = ’08/01/2014’
30
30. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Executing PGQL
31
pgx> pg.queryPgql("
pgx> SELECT x.name, y.name
pgx> WHERE (x) -[:leads]-> (y)
pgx> ORDER BY x.name, y.name
pgx> ").print()
+--------------------------------------------+
| x.name | y.name |
+--------------------------------------------+
| "Bobby Murphy" | "Snapchat" |
| "Ertharin Cousin" | "World Food Programme" |
| "Evan Spiegel" | "Snapchat" |
| "Google" | "Nest" |
| "Jack Ma" | "Alibaba" |
| "Jeff Bezos" | "Amazon" |
| "Pony Ma" | "Tencent" |
| "Pope Francis" | "The Vatican" |
| "Tony Fadell" | "Nest" |
+--------------------------------------------+
31. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Confidential – Oracle
32
Pre-Built Analytics
32. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Graph Algorithms and their Applications
• Page rank, Weighted page rank
– find influencers, critical vertices
• Personalized page rank
– find important people/products with respect to a given starting point
• Sparsification
– trim down the graph to make it more fragmented
• Clustering
– find communities which can be the basis of segmentation, and/or
recommendation/anomaly detection, churn analysis
33
33. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Graph Algorithms and their Applications
• BFS
– impact analysis, link analysis
• Shortest path
– discover links, find suspect’s close collaborators, transportation routing
• Matrix factorization
– recommendation
• Centrality (in-degree, out-degree, between-ness, closeness)
– find critical people, devices, router
34
34. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Graph Algorithms and their Applications
• Salsa
– recommendation
• Graph Filtering
– segmentation
• Graph Query/Pattern matching
– find anomaly, detect fraud, discover correlation, recommend by popularity,
segmentation
• Text search
– find similarity, fuzzy ranking, relevancy ranking, recommendation,
– GeoSpatial search/filtering, sentiment analysis, faceted query
35
35. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
36
Graph Analytics in practice
36. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Build Recommender System with
Graph Technologies
37. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
• Environment
– Oracle Big Data Lite VM
– Oracle Big Data Spatial and Graph
– Apache SolrCloud
• A “user-item” property graph
– Vertices (items, descriptions, and users)
– Edges (linking users and items)
Recommendation: you may
also like
Building a Recommender System
38. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Building a Recommender System
Multiple approaches - and they can be mixed together
Collaborative filtering
• People liked similar items in the past
will like similar items in the future
Content-based filtering
• Match item description
• Match user profile
• Relevancy ranking
Personalized Page Ranking
• Randomly navigate from a user to a
product, then back to a user, …
• Randomly jump to starting point(s)
• A u
• u B
• B w
• w C
…
A
B
C
u
v
w
x
39. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Personalized Page Rank-based Recommender System
Random walk with restart
Reference: https://blogs.oracle.com/bigdataspatialgraph/entry/intuitive_explanation_of_personalized_page
40. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Key API for Personalized Page Rank
• API: ppr=analyst.personalizedPagerank (
pgxGraph,
vertexSet,
0.0001 /*max error*/,
0.85 /*damping factor*/,
1000 );
• Result: ppr.getTopKValues()
it=ppr.getTopKValues(9).iterator(); while (it.hasNext()) {
entry=it.next(); vid=entry.getKey().getId();
System.out.format("ppr=%.4f vertex=%sn", entry.getValue(), opg.getVertex(vid));
}
ppr=0.2496 vertex=Vertex ID 1 {name:str:John, age:int:10}
ppr=0.1758 vertex=Vertex ID 11 {type:str:Prod, desc:str:Kindle Fire}
ppr=0.1758 vertex=Vertex ID 10 {type:str:Prod, desc:str:iPhone5, …}
41
41. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
• Matrix factorization • Graph intuition
Recommendation with C.F.
A customer’s taste signature is
defined by what he/she likes
Customer Item
An item’s taste signature is
(recursively) defined by
who likes it
A recursive graph algorithm solves taste
signature of both customers and items
[0.758 0.331 0.124 …]
[0.328 0.172 0.519 ….]
[0.231 0.119 0.033 ….]
[0.305 0.888 0.931 ….]
[0.758 0.331 0.124 ….]
[0.391 0.551 0.223 …]
[0.112 0.237 0.456 …]
42. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Demo
Build Recommender System with
Graph Technologies
43. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Circular Payment Fraud Detection
44. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Circular Payment Fraud Detection
• Business requirement: detect n-hop circular payment, a particular fraud
pattern, in real-world transaction data.
• Input: Transaction Data(from a Tax dept) in CSV
• Output: identify circular payment in the form of A => B => C => … => A
45
45. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Payment Data in Property Graph Format (.opv & .ope)
46
223855785052710,original_id,1,223855785052710,,
223855785052710,xzqh_dm_shi,1,223855,,
345337155048029,original_id,1,345337155048029,,
345337155048029,hymc,1,批发业,,
345337155048029,xzqh_dm_shi,1,345355,,
398238126805852918,original_id,1,19545407609453X,,
398238126805852918,xzqh_dm_shi,1,195455,,
345337078144545,original_id,1,345337078144545,,
345337078144545,hymc,1,装卸搬运和运输代理业,,
345337078144545,xzqh_dm_shi,1,345355,,
345337984038796,original_id,1,345337984038796,,
345337984038796,hymc,1,装卸搬运和运输代理业,,
345337984038796,xzqh_dm_shi,1,345355,,
167,194053906467083,345343155038506,bought_from,recorded_by,1,000,,
167,194053906467083,345343155038506,bought_from,expiration,1,,,
342,345358506939459,345356155269539,bought_from,recorded_by,1,263,,
342,345358506939459,345356155269539,bought_from,expiration,1,,,
46. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Circular Payment Detection Pipeline
• Major steps involved
– Convert source transaction data into Oracle flat files
– Execute PGQL
– Run built-in analytics
– Visualize
47
47. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Circular Payment Detection Pipeline
• Analytics and PGQL implementation details
– PGQL
select m, n
where (m)-[e1]->(n)-[e2]->(m)
select m, n, o
where (m)-[e1]->(n)-[e2]->(o)-[e3]->(m)
select m, n, o, p
where (m)-[e1]->(n)-[e2]->(o)-[e3]->(p)-[e4]->(m)
…
– Add all_different( m, n, o, p) if needed
– Use <- or just –[e]- if needed
48
48. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Circular Payment Detection Pipeline
• Analytics and PGQL implementation details
– Run built-in analytics
analyst.sccKosaraju(pgxGraph)
49
https://en.wikipedia.org/wiki/Kosaraju%27s_algorithm
49. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Circular Payment Detection Pipeline
50
50. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Circular Payment Detection Pipeline
51
51. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Circular Payment Visualization with Cytoscape
52
52. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Demo
Circular Payment Fraud Detection
53. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
54
Resources
54. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Resources
• Oracle Spatial and Graph & Big Data Spatial and Graph on OTN
oracle.com/technetwork/database/options/spatialandgraph
oracle.com/technetwork/database/database-technologies/bigdata-spatialandgraph
– White papers, software downloads, documentation and videos
• Blogs – examples, tips & tricks
blogs.oracle.com/oraclespatial | blogs.oracle.com/bigdataspatialgraph
• Property Graphs 101: How to Get Started with Property Graphs on the Oracle Database –
Arthur Dayton, Vlamis Software https://youtu.be/QSj0zOjOAWI
• YouTube channel: https://www.youtube.com/channel/UCZqBavfLlCuS0il6zNY696w
• Oracle Big Data Lite Virtual Machine - a free sandbox to get started
www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html
– Hands On Lab included in /opt/oracle/oracle-spatial-graph/ or http://github.com/oracle/BigDataLite/
55
55. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. 56
Tutorials
https://docs.oracle.com/cd/E56133_01/latest/tutorials/index.html
56. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Resources – social media and online communities
• Follow the product team: @SpatialHannes, @JeanIhm, @agodfrin, @alanzwu
• Oracle Spatial and Graph SIG user groups (search “Oracle Spatial and
Graph Community”)
57
57. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
AskTOM sessions on property graphs
• Next Spatial and Graph session July 17
– Topic: Graph Visualization
• View recordings, submit feedback, questions,
topic requests, view upcoming session dates and
topics, sign up to get regular updates
58
https://devgym.oracle.com/pls/apex/dg/office_hours/3084
58. Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
59
Thanks for attending! See you next month.
https://devgym.oracle.com/pls/apex/dg/office_hours/3084