Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Visual Analysis of Social Media Data from
using Graph Technologies
DATA NATIVES 2018 | Nov 22-23, 2018 | Berlin
Karin Patenge | Principal Solution Engineer | Cloud & Core Technologies
@kpatenge |  karin.patenge@oracle.com
Oracle Deutschland B.V. & Co. KG | Potsdam | Schiffbauergasse 14
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | @kpatenge @datanativesconf #DN18
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Accessing Data Entities
• Data retrieval via REST API
https://www.meetup.com/meetup_api
• Different API methods & versions
• API Key required
• Sample request
• Data returned as JSON
@kpatenge @datanativesconf #DN18
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Potential Questions of Interest
• Which Meetup groups are most active in terms of:
– # members
– # events
– # event attendees
• Who and where are influencers in the Meetup community?
• Where are connections between the Meetup groups in different locations?
• Which topics are “hot” and where?
• How close/similar are groups?
• …
@kpatenge @datanativesconf #DN18
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Approach: Modeling Data as Graphs
The more connected the data is, the better a Graph fits
Source: http://www.ateam-oracle.com/intro-to-graphs-at-oracle/
@kpatenge @datanativesconf #DN18
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• A set of nodes (aka vertices)
– each vertex has a unique identifier
– each vertex has a set of in/out edges
– each vertex has a collection of key-value
properties
• A set of edges
– each edge has a unique identifier
– each edge has a head/tail vertex
– each edge has a label denoting type of
relationship between two vertices
– each edge has a collection of key-value properties
• Implementations
– Oracle (Spatial and Graph/Big Data Spatial and
Graph), Neo4j, DataStax (Titan), InfiniteGraph, …
What is a Property Graph?
https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model
@kpatenge @datanativesconf #DN18
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• PageRank, Weighted PageRank
– Find influencers, critical vertices
• Personalized PageRank
– Find important people/products/...
with respect to a given starting point
• Sparsification
– Trim down the graph to make it more
fragmented
• Clustering
– Find communities which can be the
basis of segmentation, and/or
recommendation/anomaly detection,
churn analysis
• Centrality
– Find critical people/devices/...
• Shortest path
– Discover links, find suspect‘s close
collaborators, transportation routing
• Breadth-First-Search (BFS)
– Impact analysis, link analysis
• Matric factorization
– Recommendation
• Reachability
– Connectivity test
• ...
Graph Algorithms and their Applications
@kpatenge @datanativesconf #DN18
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Pathfinding
– fattestPath
– shortestPathBellmanFord
– shortestPathBellmanFordReverse
– shortestPathDijkstra
– shortestPathDijkstraBidirectional
– shortestPathFilteredDijkstra
– shortestPathFilteredDijkstraBidirectional
– shortestPathHopDist
– shortestPathHopDistReverse
Ranking
– closenessCentralityUnitLength
– degreeCentrality
– eigenvectorCentrality
– Hyperlink-Induced Topic Search (HITS)
– inDegreeCentrality
– nodeBetweennessCentrality
– outDegreeCentrality
– PageRank, weighted PageRank
– approximatePagerank
– personalizedPagerank
– randomWalkWithRestart
Social Network Analysis Algorithms (1)
@kpatenge @datanativesconf #DN18
https://tinyurl.com/pgxdocs
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Structure Evaluation
– Conductance
– countTriangles
– inDegreeDistribution
– outDegreeDistribution
– partitionConductance
– partitionModularity
– sparsify
– K-Core computes
Community Detection
– communitiesLabelPropagation
Recommendation
– salsa
– personalizedSalsa
– whomToFollow
Classic - Connected Components
– sccKosaraju
– sccTarjan
– wcc
Social Network Analysis Algorithms (2)
@kpatenge @datanativesconf #DN18
https://tinyurl.com/pgxdocs
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Architecture of Oracle Property Graph Analytics
Property
Graph formats
GraphML
GML
GraphSON
Flat Files
@kpatenge @datanativesconf #DN18
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
PGX
• Toolkit for In-Memory, Parallel Graph
Analytics containing
– PGX shell
– Analyst API with a large collection of built-in
algorithms (45+)
– Enhance with user defined algorithms written
in Green-Marl
– Tutorials, JavaDoc, Use Cases, and more
• Developed by Oracle Labs
• https://docs.oracle.com/cd/E56133_01/latest/i
ndex.html
PGQL – Property Graph Query Language
• http://pgql-lang.org/
• Graph Pattern Matching combined with
SQL
• Developed by Oracle Labs
• Proposed for standardization
• Changes in Version 1.1:
http://pgql-lang.org/spec/1.1/#breaking-syntax-
changes-since-pgql-10
Property Graph Analytics Engine
@kpatenge @datanativesconf #DN18
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Data Processing and Analysis Workflow: Overview
Retrieve&Prepare
Prepare
source data
• Using R for data
retrieval via
REST API and
conversion
JSON  CSV 
OPV/OPE
Load&Build
Load
nodes and
edges data
into a graph
• Using Oracle
NoSQL DB as
Graph data
store
Analyze
Analyze
graph data
• Using Graph
Analytics Engine
(PGX) and
Property Graph
Query Language
(PGQL)
Visualize
Visualize
graph data
• Using
Cytoscape
Results
Summarize
results
@kpatenge @datanativesconf #DN18
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Demo
@kpatenge @datanativesconf #DN18
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | @kpatenge @datanativesconf #DN18
‚Big Data‘ Groups
in relation with
Topics and Cities
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | @kpatenge @datanativesconf #DN18
‚Big Data‘ Groups
in relation with
Organizers, Cities
and Events
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | @kpatenge @datanativesconf #DN18
‚Big Data‘ Groups
in relation with
Organizers and Cities
Weakly Connected
Components (WCC)
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | @kpatenge @datanativesconf #DN18
Ranking via PageRank (Top 10+1)
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | @kpatenge @datanativesconf #DN18
Ranking via PageRank (Top 10+1)
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | @kpatenge @datanativesconf #DN18
Ranking via PageRank (Top 10+1)
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | @kpatenge @datanativesconf #DN18
Ranking via PageRank (Top 10+1)
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
✓Which cities are tech hot spots?
✓Who are important people in the
Meetup landscape?
✓Which Meetup groups cover with
topics?
✓Which Meetup groups are relevant in
terms of
#Members, #Participants of events,
#Events
✓Which Meetup groups are related
and how?
✓Which topics are related and how?
• The way you model the graph has
influence on the results of executing
Graph algorithms
• The choice of edge directions does
matter depending on the algorithms
• Attaching weights to edges is useful
for certain algorithms
Some Results
@kpatenge @datanativesconf #DN18
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Key Takeaways
• Graph data model perfect to focus on connectivity
• Graph databases are powerful tools, complementing relational and other
databases
– Especially strong for analysis of graph topology and connectedness
• Visual analysis helps a great deal to understand how data are connected
– New insights, especially with relationships, dependencies and behavioral patterns
• Big variety of analytic tools and frameworks to answer all kind of questions
• Oracle Graph Technologies combined with Open Source or 3rd party tools
@kpatenge @datanativesconf #DN18
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Follow us @kpatenge @SpatialHannes @JeanIhm
karin.patenge@oracle.com
GitHub:
https://github.com/karinpatenge/DN2018
Blogs:
https://blogs.oracle.com/bigdataspatialgraph/
https://blogs.oracle.com/oraclespatial/
AskTom Office Hours for Property Graph:
https://asktom.oracle.com/pls/apex/f?p=100:551
@kpatenge @datanativesconf #DN18
20181123 dn2018 graph_analytics_k_patenge

20181123 dn2018 graph_analytics_k_patenge

  • 1.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Visual Analysis of Social Media Data from using Graph Technologies DATA NATIVES 2018 | Nov 22-23, 2018 | Berlin Karin Patenge | Principal Solution Engineer | Cloud & Core Technologies @kpatenge |  karin.patenge@oracle.com Oracle Deutschland B.V. & Co. KG | Potsdam | Schiffbauergasse 14
  • 2.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | @kpatenge @datanativesconf #DN18
  • 3.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | Accessing Data Entities • Data retrieval via REST API https://www.meetup.com/meetup_api • Different API methods & versions • API Key required • Sample request • Data returned as JSON @kpatenge @datanativesconf #DN18
  • 4.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | Potential Questions of Interest • Which Meetup groups are most active in terms of: – # members – # events – # event attendees • Who and where are influencers in the Meetup community? • Where are connections between the Meetup groups in different locations? • Which topics are “hot” and where? • How close/similar are groups? • … @kpatenge @datanativesconf #DN18
  • 5.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | Approach: Modeling Data as Graphs The more connected the data is, the better a Graph fits Source: http://www.ateam-oracle.com/intro-to-graphs-at-oracle/ @kpatenge @datanativesconf #DN18
  • 6.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | • A set of nodes (aka vertices) – each vertex has a unique identifier – each vertex has a set of in/out edges – each vertex has a collection of key-value properties • A set of edges – each edge has a unique identifier – each edge has a head/tail vertex – each edge has a label denoting type of relationship between two vertices – each edge has a collection of key-value properties • Implementations – Oracle (Spatial and Graph/Big Data Spatial and Graph), Neo4j, DataStax (Titan), InfiniteGraph, … What is a Property Graph? https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model @kpatenge @datanativesconf #DN18
  • 7.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | • PageRank, Weighted PageRank – Find influencers, critical vertices • Personalized PageRank – Find important people/products/... with respect to a given starting point • Sparsification – Trim down the graph to make it more fragmented • Clustering – Find communities which can be the basis of segmentation, and/or recommendation/anomaly detection, churn analysis • Centrality – Find critical people/devices/... • Shortest path – Discover links, find suspect‘s close collaborators, transportation routing • Breadth-First-Search (BFS) – Impact analysis, link analysis • Matric factorization – Recommendation • Reachability – Connectivity test • ... Graph Algorithms and their Applications @kpatenge @datanativesconf #DN18
  • 8.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | Pathfinding – fattestPath – shortestPathBellmanFord – shortestPathBellmanFordReverse – shortestPathDijkstra – shortestPathDijkstraBidirectional – shortestPathFilteredDijkstra – shortestPathFilteredDijkstraBidirectional – shortestPathHopDist – shortestPathHopDistReverse Ranking – closenessCentralityUnitLength – degreeCentrality – eigenvectorCentrality – Hyperlink-Induced Topic Search (HITS) – inDegreeCentrality – nodeBetweennessCentrality – outDegreeCentrality – PageRank, weighted PageRank – approximatePagerank – personalizedPagerank – randomWalkWithRestart Social Network Analysis Algorithms (1) @kpatenge @datanativesconf #DN18 https://tinyurl.com/pgxdocs
  • 9.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | Structure Evaluation – Conductance – countTriangles – inDegreeDistribution – outDegreeDistribution – partitionConductance – partitionModularity – sparsify – K-Core computes Community Detection – communitiesLabelPropagation Recommendation – salsa – personalizedSalsa – whomToFollow Classic - Connected Components – sccKosaraju – sccTarjan – wcc Social Network Analysis Algorithms (2) @kpatenge @datanativesconf #DN18 https://tinyurl.com/pgxdocs
  • 10.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | Architecture of Oracle Property Graph Analytics Property Graph formats GraphML GML GraphSON Flat Files @kpatenge @datanativesconf #DN18
  • 11.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | PGX • Toolkit for In-Memory, Parallel Graph Analytics containing – PGX shell – Analyst API with a large collection of built-in algorithms (45+) – Enhance with user defined algorithms written in Green-Marl – Tutorials, JavaDoc, Use Cases, and more • Developed by Oracle Labs • https://docs.oracle.com/cd/E56133_01/latest/i ndex.html PGQL – Property Graph Query Language • http://pgql-lang.org/ • Graph Pattern Matching combined with SQL • Developed by Oracle Labs • Proposed for standardization • Changes in Version 1.1: http://pgql-lang.org/spec/1.1/#breaking-syntax- changes-since-pgql-10 Property Graph Analytics Engine @kpatenge @datanativesconf #DN18
  • 12.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | Data Processing and Analysis Workflow: Overview Retrieve&Prepare Prepare source data • Using R for data retrieval via REST API and conversion JSON  CSV  OPV/OPE Load&Build Load nodes and edges data into a graph • Using Oracle NoSQL DB as Graph data store Analyze Analyze graph data • Using Graph Analytics Engine (PGX) and Property Graph Query Language (PGQL) Visualize Visualize graph data • Using Cytoscape Results Summarize results @kpatenge @datanativesconf #DN18
  • 13.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | Demo @kpatenge @datanativesconf #DN18
  • 14.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | @kpatenge @datanativesconf #DN18 ‚Big Data‘ Groups in relation with Topics and Cities
  • 15.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | @kpatenge @datanativesconf #DN18 ‚Big Data‘ Groups in relation with Organizers, Cities and Events
  • 16.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | @kpatenge @datanativesconf #DN18 ‚Big Data‘ Groups in relation with Organizers and Cities Weakly Connected Components (WCC)
  • 17.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | @kpatenge @datanativesconf #DN18 Ranking via PageRank (Top 10+1)
  • 18.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | @kpatenge @datanativesconf #DN18 Ranking via PageRank (Top 10+1)
  • 19.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | @kpatenge @datanativesconf #DN18 Ranking via PageRank (Top 10+1)
  • 20.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | @kpatenge @datanativesconf #DN18 Ranking via PageRank (Top 10+1)
  • 21.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | ✓Which cities are tech hot spots? ✓Who are important people in the Meetup landscape? ✓Which Meetup groups cover with topics? ✓Which Meetup groups are relevant in terms of #Members, #Participants of events, #Events ✓Which Meetup groups are related and how? ✓Which topics are related and how? • The way you model the graph has influence on the results of executing Graph algorithms • The choice of edge directions does matter depending on the algorithms • Attaching weights to edges is useful for certain algorithms Some Results @kpatenge @datanativesconf #DN18
  • 22.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | Key Takeaways • Graph data model perfect to focus on connectivity • Graph databases are powerful tools, complementing relational and other databases – Especially strong for analysis of graph topology and connectedness • Visual analysis helps a great deal to understand how data are connected – New insights, especially with relationships, dependencies and behavioral patterns • Big variety of analytic tools and frameworks to answer all kind of questions • Oracle Graph Technologies combined with Open Source or 3rd party tools @kpatenge @datanativesconf #DN18
  • 23.
    Copyright © 2018,Oracle and/or its affiliates. All rights reserved. | Follow us @kpatenge @SpatialHannes @JeanIhm karin.patenge@oracle.com GitHub: https://github.com/karinpatenge/DN2018 Blogs: https://blogs.oracle.com/bigdataspatialgraph/ https://blogs.oracle.com/oraclespatial/ AskTom Office Hours for Property Graph: https://asktom.oracle.com/pls/apex/f?p=100:551 @kpatenge @datanativesconf #DN18