Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Visual Analysis of Social Media Data from
Meetup.com using Graph Technologies
code.talks 2018 | October 18/19, 2018 | Hamburg
1
Karin Patenge | Principal Solution Engineer | BU Cloud & Core Technologies
@kpatenge | karin.patenge@oracle.com | www.slideshare.net/kpatenge
Oracle Deutschland B.V. & Co. KG
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |@kpatenge #codetalkshh
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Data of Interest
• Data retrieval via REST API
https://www.meetup.com/meetup_api
• Different API methods & versions
• API Key required
• Sample request
• Data returned as JSON
@kpatenge #codetalkshh
Direct relations not (yet) analyzed
is_interested_in
is_member_of
is_assigned_to
has_registered_for
is_located_in
is_located_in
takes_place_at
Venue
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Potential Questions of Interest
• Which Meetup groups are most active in terms of:
– # members
– # events
– # event attendees
• Who and where are influencers in the Meetup community?
• Where are connections between the Meetup groups in different locations?
• Which topics are “hot” and where?
• How close/similar are groups?
• …
@kpatenge #codetalkshh
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Modeling Data as Graphs
The more connected the data is, the better a Graph fits
Graphic source: http://www.ateam-oracle.com/intro-to-graphs-at-oracle/
@kpatenge #codetalkshh
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• A set of nodes (aka vertices)
– each vertex has a unique identifier
– each vertex has a set of in/out edges
– each vertex has a collection of key-value
properties
• A set of edges
– each edge has a unique identifier
– each edge has a head/tail vertex
– each edge has a label denoting type of
relationship between two vertices
– each edge has a collection of key-value properties
• Blueprints Java APIs
• Implementations
– Oracle (Spatial and Graph, Big Data Spatial and
Graph), Neo4j, DataStax (Titan), InfiniteGraph,
Dex, Sail, MongoDB, …
What is a Property Graph?
https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model
2
@kpatenge #codetalkshh
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• PageRank, Weighted PageRank
– Find influencers, critical vertices
• Personalized PageRank
– Find important people/products/...
with respect to a given starting point
• Sparsification
– Trim down the graph to make it more
fragmented
• Clustering
– Find communities which can be the
basis of segmentation, and/or
recommendation/anomaly detection,
churn analysis
• Centrality
– Find critical people/devices/...
• Shortest path
– Discover links, find suspect‘s close
collaborators, transportation routing
• Breadth-First-Search (BFS)
– Impact analysis, link analysis
• Matric factorization
– Recommendation
• Reachability
– Connectivity test
• ...
Graph Algorithms and their Applications
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Architecture of Property Graph Support
3
Graph Data Access Layer (DAL)
Graph Analytics
Blueprints & Lucene/SolrCloud RDF (RDF/XML, N-
Triples, N-Quads,
TriG,N3,JSON)
REST/Web
Service/Notebooks
Java,Groovy,Python,…
Java APIs
Java APIs/JDBC/SQL/PLSQL
Property Graph
formats
GraphML
GML
GraphSON
Flat FilesScalable and Persistent Storage Management
Parallel In-Memory Graph
Analytics (PGX) /
Graph Querying (PGQL)
Oracle NoSQL
Database
Oracle RDBMS Apache HBase
Apache
Spark
@kpatenge #codetalkshh
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
PGX
• Toolkit for In-Memory, Parallel Graph
Analysis containing
– PGX shell
– Analyst API with a large collection of built-in
algorithms (45+)
– Tutorials and more
• Developed by Oracle Labs
• https://docs.oracle.com/cd/E56133_01/latest/i
ndex.html
• https://event.cwi.nl/grades/2018/07-
VanRest.pdf
PGQL – Property Graph Query Language
• http://pgql-lang.org/
• Graph Pattern Matching combined with
SQL
• Developed by Oracle Labs
• Proposed for standardization
• Changes in Version 1.1:
http://pgql-lang.org/spec/1.1/#breaking-syntax-
changes-since-pgql-10
Property Graph Analytics Engine
@kpatenge #codetalkshh
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Data Processing and Analysis Workflow: Overview
Retrieve&Prepare
Prepare
source data
• Using R for data
retrieval via
REST API and
conversion
JSON  CSV 
OPV/OPE
Load&Build
Load
nodes and
edges data
into a graph
• Using Oracle
NoSQL DB as
Graph data
store
Analyze
Analyze
graph data
• Using Graph
Analytics Engine
(PGX) and
Property Graph
Query Language
(PGQL)
Visualize
Visualize
graph data
• Using
Cytoscape
Results
Summarize
results
@kpatenge #codetalkshh
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
My Demo Environment
• Available for free:
Oracle Big Data Lite VM 4.11 running in Oracle VirtualBox
– Big Data Spatial and Graph (BDSG) 2.5 including Property Graph Query Language
(PGQL) 1.0
http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html
• Gremlin, Apache Groovy Shell
• Zeppelin Notebook with PGX Interpreter
– Oracle NoSQL Database (Minimal instance with 1 node, no replication, aka kvlite)
– RStudio
• Several R packages loaded
– Cytoscape 3.6.0
• Big Data Spatial and Graph 2.4 support installed
@kpatenge #codetalkshh
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Demo
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Summarize (Preliminary) Results
✓Who are important people in the Meetup landscape?
✓Which Meetup groups should we talk to for certain topics?
✓Which Meetup groups are relevant in terms of
#Members, #Participants of events, #Events
✓Which Meetup groups are related and how?
✓...
5
@kpatenge #codetalkshh
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Key Takeaways – So far
• Graph data model perfect to focus on connectivity
• Code written once, re-useable many times to retrieve data from every
desired location (city)
• Visual analysis helps a great deal to understand how data are connected
• Big variety of analytic tools and frameworks to answer all kind of questions
– Integrated distributed, in-memory graph analytics engine
• Use case of how to combine Open Source with Oracle Technologies
• Please also check latest Graph talks during
Analytics and Data Summit in March 2018
– https://analyticsanddatasummit.org/schedule/
5
@kpatenge #codetalkshh
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Follow us @kpatenge @SpatialHannes @JeanIhm
karin.patenge@oracle.com
@kpatenge #codetalkshh
Blogs:
https://blogs.oracle.com/bigdataspatialgraph/
https://blogs.oracle.com/oraclespatial/
AskTom Office Hours for Property Graph:
https://asktom.oracle.com/pls/apex/f?p=100:551
20181019 code.talks graph_analytics_k_patenge

20181019 code.talks graph_analytics_k_patenge

  • 1.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Visual Analysis of Social Media Data from Meetup.com using Graph Technologies code.talks 2018 | October 18/19, 2018 | Hamburg 1 Karin Patenge | Principal Solution Engineer | BU Cloud & Core Technologies @kpatenge | karin.patenge@oracle.com | www.slideshare.net/kpatenge Oracle Deutschland B.V. & Co. KG
  • 2.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. |@kpatenge #codetalkshh
  • 3.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | Data of Interest • Data retrieval via REST API https://www.meetup.com/meetup_api • Different API methods & versions • API Key required • Sample request • Data returned as JSON @kpatenge #codetalkshh Direct relations not (yet) analyzed is_interested_in is_member_of is_assigned_to has_registered_for is_located_in is_located_in takes_place_at Venue
  • 4.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | Potential Questions of Interest • Which Meetup groups are most active in terms of: – # members – # events – # event attendees • Who and where are influencers in the Meetup community? • Where are connections between the Meetup groups in different locations? • Which topics are “hot” and where? • How close/similar are groups? • … @kpatenge #codetalkshh
  • 5.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | Modeling Data as Graphs The more connected the data is, the better a Graph fits Graphic source: http://www.ateam-oracle.com/intro-to-graphs-at-oracle/ @kpatenge #codetalkshh
  • 6.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | • A set of nodes (aka vertices) – each vertex has a unique identifier – each vertex has a set of in/out edges – each vertex has a collection of key-value properties • A set of edges – each edge has a unique identifier – each edge has a head/tail vertex – each edge has a label denoting type of relationship between two vertices – each edge has a collection of key-value properties • Blueprints Java APIs • Implementations – Oracle (Spatial and Graph, Big Data Spatial and Graph), Neo4j, DataStax (Titan), InfiniteGraph, Dex, Sail, MongoDB, … What is a Property Graph? https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model 2 @kpatenge #codetalkshh
  • 7.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | • PageRank, Weighted PageRank – Find influencers, critical vertices • Personalized PageRank – Find important people/products/... with respect to a given starting point • Sparsification – Trim down the graph to make it more fragmented • Clustering – Find communities which can be the basis of segmentation, and/or recommendation/anomaly detection, churn analysis • Centrality – Find critical people/devices/... • Shortest path – Discover links, find suspect‘s close collaborators, transportation routing • Breadth-First-Search (BFS) – Impact analysis, link analysis • Matric factorization – Recommendation • Reachability – Connectivity test • ... Graph Algorithms and their Applications
  • 8.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | Architecture of Property Graph Support 3 Graph Data Access Layer (DAL) Graph Analytics Blueprints & Lucene/SolrCloud RDF (RDF/XML, N- Triples, N-Quads, TriG,N3,JSON) REST/Web Service/Notebooks Java,Groovy,Python,… Java APIs Java APIs/JDBC/SQL/PLSQL Property Graph formats GraphML GML GraphSON Flat FilesScalable and Persistent Storage Management Parallel In-Memory Graph Analytics (PGX) / Graph Querying (PGQL) Oracle NoSQL Database Oracle RDBMS Apache HBase Apache Spark @kpatenge #codetalkshh
  • 9.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | PGX • Toolkit for In-Memory, Parallel Graph Analysis containing – PGX shell – Analyst API with a large collection of built-in algorithms (45+) – Tutorials and more • Developed by Oracle Labs • https://docs.oracle.com/cd/E56133_01/latest/i ndex.html • https://event.cwi.nl/grades/2018/07- VanRest.pdf PGQL – Property Graph Query Language • http://pgql-lang.org/ • Graph Pattern Matching combined with SQL • Developed by Oracle Labs • Proposed for standardization • Changes in Version 1.1: http://pgql-lang.org/spec/1.1/#breaking-syntax- changes-since-pgql-10 Property Graph Analytics Engine @kpatenge #codetalkshh
  • 10.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | Data Processing and Analysis Workflow: Overview Retrieve&Prepare Prepare source data • Using R for data retrieval via REST API and conversion JSON  CSV  OPV/OPE Load&Build Load nodes and edges data into a graph • Using Oracle NoSQL DB as Graph data store Analyze Analyze graph data • Using Graph Analytics Engine (PGX) and Property Graph Query Language (PGQL) Visualize Visualize graph data • Using Cytoscape Results Summarize results @kpatenge #codetalkshh
  • 11.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | My Demo Environment • Available for free: Oracle Big Data Lite VM 4.11 running in Oracle VirtualBox – Big Data Spatial and Graph (BDSG) 2.5 including Property Graph Query Language (PGQL) 1.0 http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html • Gremlin, Apache Groovy Shell • Zeppelin Notebook with PGX Interpreter – Oracle NoSQL Database (Minimal instance with 1 node, no replication, aka kvlite) – RStudio • Several R packages loaded – Cytoscape 3.6.0 • Big Data Spatial and Graph 2.4 support installed @kpatenge #codetalkshh
  • 12.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | Demo
  • 13.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | Summarize (Preliminary) Results ✓Who are important people in the Meetup landscape? ✓Which Meetup groups should we talk to for certain topics? ✓Which Meetup groups are relevant in terms of #Members, #Participants of events, #Events ✓Which Meetup groups are related and how? ✓... 5 @kpatenge #codetalkshh
  • 14.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | Key Takeaways – So far • Graph data model perfect to focus on connectivity • Code written once, re-useable many times to retrieve data from every desired location (city) • Visual analysis helps a great deal to understand how data are connected • Big variety of analytic tools and frameworks to answer all kind of questions – Integrated distributed, in-memory graph analytics engine • Use case of how to combine Open Source with Oracle Technologies • Please also check latest Graph talks during Analytics and Data Summit in March 2018 – https://analyticsanddatasummit.org/schedule/ 5 @kpatenge #codetalkshh
  • 15.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | Follow us @kpatenge @SpatialHannes @JeanIhm karin.patenge@oracle.com @kpatenge #codetalkshh Blogs: https://blogs.oracle.com/bigdataspatialgraph/ https://blogs.oracle.com/oraclespatial/ AskTom Office Hours for Property Graph: https://asktom.oracle.com/pls/apex/f?p=100:551