SlideShare a Scribd company logo
1 of 50
Copyright © 2018, Oracle and/or its affiliates. All rights reserved.
Oracle Ask TOM Office Hours:
When Graphs Meet Machine
Learning
Sungpack Hong, Research Director, Oracle Labs
Jean Ihm, Product Manager @JeanIhm
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
AskTOM sessions on property graphs
• Today’s is the sixth session on property graphs
• In our past sessions, we showed …
– An introduction to Property Graphs, how to model graphs from relational data, perform graph
analytics, visualize graphs, query graphs
• Today’s topic: When Graphs Meet Machine Learning (use cases)
• Visit the Spatial and Graph landing page to view recordings of past
sessions; submit feedback, questions, topic requests; view upcoming
session dates and topics; sign up
2
https://devgym.oracle.com/pls/apex/dg/office_hours/3084
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
3
The Story So Far …
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data Spatial and Graph
• Available for Big Data platform/BDCS
– Hadoop, HBase, Oracle NoSQL
• Supported both on BDA and commodity
hardware
– CDH and Hortonworks
• Database connectivity through Big Data
Connectors or Big Data SQL
• Included in Big Data Cloud Service
Oracle Spatial and Graph
• Available with Oracle 18c/12.2/DBCS
• Using tables for graph persistence
• Graph views on relational data
• In-database graph analytics
– Sparsification, shortest path, page rank, triangle
counting, WCC, sub graphs
• SQL queries possible
• Included in Database Cloud Service
4
Graph Product Options
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Graph Storage Management
5
Architecture of Property Graph
Graph Analytics
Blueprints/Tinkerpop/Gremlin
REST/WebService
APIs
Java,Groovy,Python,…
Scalable and Persistent Storage
Oracle NoSQL
Database
Oracle RDBMS Apache HBase
Parallel In-Memory Graph Analytics
pgql> SELECT ...
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Graph Analysis for Business Insight
6
Identify
Influencers
Discover Graph Patterns
in Big Data
Generate
Recommendations
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
7
When Graphs Meet Machine Learning:
Use Cases
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Graph Benefits
• Why do we want to have graph as data model (as opposed to relational)
• Some of graph benefits
– Intuitive data model
– Fast query over multi-hop relationships
– Data visualization and interactive exploration
– Enhanced data analysis via graph signals
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Using Graph for Data Analysis
• The main idea
– Graph captures fine-grained relationships (as edges) between data entities
– By using these (materialized) relationships as new signals,
– We can extract some useful information about the original data set
– … but exactly how?
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Agenda
Classical Method: Via Graph Algorithms
Generating Features From Graph Algorithms
Graph Embedding Techniques
1
2
3
10
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Classical Approach: Graph Algorithms
• Starting from graph data model
• Apply computational graph algorithms (+ graph queries)
– E.g. centrality, reachability, closeness, …
– Graph algorithm computes specific characteristics of the graph model
– Use the algorithm results to get answer for your question
• e.g. what entities are closer to other entities?
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Example – Anomaly Detection in Medicare Data
• Using a Public Dataset
• From US Center for Medicare and Medicaid Services (CMS)
– Health-care Billing Data for CY 2012
– Aggregated medical transactions: 9,153,272 records with 29 variables
– Transactions between 880,644 medical providers and CMS
with total amounts > $77B for the year
• Data Entities
– Medical providers (doctors)
– Medical procedures – operations, prescription, treatments …
– i.e. Who is doing what (and charging it to Medicare)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Anomaly Definition
– Doctors of the same specialty provide similar
services
– What if a doctor perform a lot of treatments
that typically belong to other specialties?
• E.g. a cardiologist doing plastic surgery?
• How do we find such cases?
 By applying graph algorithm
Anomaly and Graph Data Model
“There is a spy among us”
Dr.
Frankenstein,
Prescribe Aspirin
Optometrists
CMS data represented as a graph
providers
services
Approach (sketch)
• Pick a specialty
• Compute random-walk distance from the doctors
of this specialty group
• (By applying personalized Pagerank algorithm)
• And check if there is any outside-specialty doctors
who are exceptionally close to this specialty group
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• False positives
– From popular common procedures
Procedures
Subsequent hospital inpatient care, typically 25 minutes per
day
Emergency department visit, moderately severe problem
Initial hospital inpatient care, typically 70 minutes per day
Subsequent hospital inpatient care, typically 35 minutes per
day
Initial hospital inpatient care, typically 50 minutes per day
…
Dealing with practical issues (1/2)
What
Eye-doctor
does
What every
Doctor does
Doctor XDoctor Y
• We can identify such procedures via another graph
algorithm – Pagerank
• And do special treatment (details omitted here)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Dealing with practical issues (2/2)
• False positives
– From close specialties
– Who does similar thing by nature
• What we did (details omitted here)
– Statistically identify such specialties
– Treat them differently
Blue: PPR distribution of
Optometrists
Red: PPR
distribution of
Opthalmologists
Eye doctor Eyeball doctor
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Results
• Example of detected anomaly
Description of the procedure ID Specialty
Removal of eye fluid (vitreous) between the
lens and retina
1760485xx6 Gastroenterology
Preventive retinal detachment treatment by
heat or laser
1760485xx6 Gastroenterology
Removal of membrane from the retina
…
1760485xx6 Gastroenterology
Stomach doctor
A stomach doctor doing eye doctor
things (and charging those to CMS)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Other Example -- Bot accounts in SNS network
• Identifying Bot Accounts in SNS network
– Represent SNS network as Graph
– Apply graph algorithm and query
– Find bots and their targets which show different communication patterns
– Find other bots that are connected via the network
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Algorithmic Approach – Summary and Issues
• Apply graph algorithms for data analysis
• Still very effective for many problems and applications
– Anomaly detection, Influence identification, Customer segmentation, Topic analysis
…
• Explainability Results based on (deterministic) algorithms computation
• Issues
– You need to know what algorithm can solve your problem
– Want to follow the machine learning (ML) trends
• Exploit existing techniques and tools in ML
• Combine with other ML models
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Feeding graph data into ML pipeline
• Goal: want to apply Machine Learning techniques using graph signals
Need some form that are suitable for feeding into conventional ML pipeline
but still carries the information in the graph
• … How can this be done?
Raw
Data
ML
Model
graph
……
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Agenda
Classical Method: Via Graph Algorithms
Generating Features From Graph Algorithms
Graph Embedding Techniques
1
2
3
20
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Approach
– Compute (various) graph algorithms to
generate some numeric values
– Feed the output of graph algorithms into ML
model
• Rationale
– Each graph algorithm result contains certain
characteristics of the graph data
– Combination of those result would keep
information about the graph structure
Feature Generation via Graph Algorithm
ML Model
Raw
Data
Graph
View
Apply graph algorithms
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Example from Security Application
• Security is an important topic, especially
for Cloud
– Identity management
– Threat Protection
– Security management
– ...
• There are many interesting success
stories for using graph for security
problems – Cisco, MS, Amazon, …
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Problem: Malware Detection from Network
• Goal
– Analyze network packet captures (PCAPs)
• Traces from malware activities and benign ones
– Apply machine learning technique
– Leann a model to distinguish malwares activities from benign ones.
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Graph-Based Approach
• Approach (from DSN’17)
– Rather than analyzing payload of each packet,
collect up packet traces as graphs
– Extract characteristics of those graphs via graph
algorithms
– Train model  Differentiate malware traces from
normal ones
Angler EK serving CryptoWall ransomware on 12/21//2015DSN’17
Observation: Malwares
have different trace
patterns than normal
activities
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Implementations
– Implemented the technique in the paper on
top of PGX
– Features generation as proposed in the pager:
node-size, edge-size, avg-degree, avg-eigen-
vector, avg-pagerank, avg-clustering coeff, …
– PGX to generate features; ML framework (e.g.
TensorFlow) for classification
– 510 malware pcaps from
https://www.malware-traffic-analysis.net/
– 1110 benign pcaps from
https://www.wireshark.org/download/automat
ed/captures/
Example in Security Application -- Malware Detection
• Data Set : Trace Graph Size Distribution
Number of vertices (malware pcaps) Number of vertices (benign pcaps)
Number of edges (malware pcaps) Number of edges (benign pcaps)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Feature importance (using Random
Forests)
• Avg-In-Degree (0.191787)
• Avg-Out-Degree (0.183539)
• Avg-Degree-Centrality (0.170083)
• Avg-EigenVector-Centrality (0.106156)
• Volume (0.088323)
• EdgeSize (0.077220)
• Avg-PageRank (0.074146)
• NodeSize (0.064597)
• Avg-Betweenness-Centrality (0.039565)
• Avg-Clustering-Coefficient (0.004584)
• (Demo)
• Accuracy on test set: 100% (Random
Forests), 99.69% (CNN)
• Confusion matrix: (tn: 224, fp: 0, fn: 0, tp: 100)
Confidential – Oracle Internal/Restricted/Highly Restricted 26
Exploration Result
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Agenda
Classic Method: Applying Graph Algorithms
Generating Features From Graph Algorithms
Graph Embedding Techniques
1
2
3
27
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, timing, and pricing of any
features or functionality described for Oracle’s products may change and remains at the
sole discretion of Oracle Corporation.
28
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Notice
• Contents in this section discuss recent techniques for combining graphs
and machine learning
• Some of these techniques are experimentally implemented in Oracle Labs
• Therefore they are not part of the Oracle Graph products
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Machine Learning and Graphs (revisit)
• Still there are issues
– Applied seemingly arbitrary set of algorithms for extracting features
– Would it work for other applications?
 Need a systematic methodology that turns graph information into n-dimensional numeric
representation, i.e. embedding
We discuss two embedding techniques: vertex embedding and graph embedding
Raw
Data
ML
Model
Graph
Representation
Numeric Representation
(N-dimensional vector)
per vertex
Raw
Data
ML
Model
Multiple Graph
Representations
Numeric Representation
(N-dimensional vector)
per graph
Vertex Embedding Graph(let) Embedding
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Vertex Embedding
• Goal: to turn graph into n-dimensional
vector
• Want to keep graph (topology)
information
– i.e. Entity distance in distance
Rhicheek Patra, Oracle Labs (ML Summit 2018) 31
Raw
Data
ML
Model
Graph
Representation
Numeric Representation
(N-dimensional vector)
x, y: data entity (represented as vertex in graph)
v(x), v(y): n- dimensional vector representation of x and y
x, y close in graph  v(x) - v(y) close in vector space
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• There are several approaches now
– Academia and Industry
• DeepWalk
– An early approach that exploits techniques
from modern NLP
– Word2Vec : a ML technique that learns
closeness between words from large number
of sentences
– Perform many random walks on the graph and
generate traces.
– Apply W2V technique on them; treating
vertices as words.
Rhicheek Patra, Oracle Labs (ML Summit 2018) 32
How to achieve this? KDD‘14
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Note: you can consider this as a mock-up of
customer segmentation problem
– Student => Customer
– Course taking => Item or service purchase
– Department => Segmentation label
Example
• Student classification
– A real dataset from university
– Can you predict a student’s major or department just by
looking at the classmates in the course that (s)he is
taking?
Rhicheek Patra, Oracle Labs (ML Summit 2018) 33
CS
ME
10.003
10.004
10.005
11.103
11.213
12.118
students courses
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• (Result #1) Graph-based prediction gives better
result than naïve application of ML (e.g. CNN) on
basic student features (e.g. age, gender,
background, …)
• (Result #2) Deep-Walk preserves information
from graph representation
• (Result #3) Deep-Walk allows to combined graph
data with other features
Results
CNN on Original Features
PPR (Graph Algorithm)
CNN on Extracted Graph Features
(from deep-walk)
CNN on Original + Graph Features
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Graph(let) Embedding
• Objective
– Want to capture closeness (similarity) between graph instances
• Example
– What classic literatures look more similar. from their character relationships?
Odyssey Beowulf Romeo & Juliet Hamlet
Want to have a systematic way to
extract features of these graphs and
compare/classify them
Need to capture irregular structure:
• Arbitrary edges between vertices
• Varying size (# of vertices and
#edges)
• Labels or properties defined on
vertices and edges
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Similarly applying sequence-based
learning
– i.e: Paragraph2Vec  learn from large text corpus.
Paragraphs composed of similar words are close in
embedding space
– Consider each graph as paragraph
– Generate random-walk on each graphs to generate
traces.
– Apply Paragraph2Vec model to learn similarity
between graphs
• Added our own improvements
Graph(let) Embedding – How to approach this?
(1) Tricks for encoding multiple properties
(2) When generating traces, consider edges as words (instead of vertices)
(3) Considering certain global properties of each graph -- e.g. size of graph
…
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Graph(let) Embedding – Cheminformatics Example
• (Demo)
• Two Datasets from cheminformatics
– National Cancer Institute (NCI109)
• #Graphs: 4127
• #Vertices: ranges from 35 to 111
• #Edges: ranges from 152 to 476
• Cancer types (binary classification)
– Proteins
• #Graphs: 1113
• #Vertices: ranges from 9 to 620
• #Edges: ranges from 64 to 4048
• Protein types (binary classification)
PG2VEC (ours)
PG2VEC (ours)
Notes
• Graph2Vec is naïve
implementation of
paragraph2vec on graph
• Our improvements made a
big difference in quality of
answer for both dataset
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• AML (Anti-money laundering)
– More serious application from
collaboration with FCCM team
– Online Step
• Transactions are monitored real-time
• Any suspicious activity is flagged (lots of
false-positives)
– Offline Step (Correlation)
• Flags(alerts) are attached to the global
financial graph
• Identify entities and flags that are closed
connected from the graph
•  Each subgraph creates a case
Other Example – Anti-Money Laundering
AlertA
AlertB
AlertA
AlertB
AlertC
Case 1 Case 2
AlertA AlertA
Case 3
– Evaluation Step
• Compute certain functions on each case to
evaluate its risk factor
• Human (investigator) makes decision
– Looks serious  Proceed to official investigation
– Looks benign  close the case
– Don’t know yet. Keep it open  wait to see if
more flags will happen
Algorithmic
approach
Graph ML to help
making this decision?
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Task #1
– Given a new case, can we find existing cases in
history that look similar to this one? (as a
reference for investigator)
: Use Pg2Vec to train and find
Graph(let) Embedding – Anti-Money Laundering
• Task #2
– Train from existing cases, learn a classifier
– i.e. system recommends that “this case looks
serious. Recommended for official
investigation”
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Graphs Machine Learning: Recent Trends
• By the way, combining graph and machine learning is a trend
– Many in industry and academia are looking at this problem
– And applying it to solving real problems
40
Pinterest Alibaba Google
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Current Directions
• Improving Scalability
– Increasing the size of graph (e.g. tens of billions of vertices)
• Combining structure (relationship) and other raw observation
– E.g. Item attributes + Co-purchase Information
– Finding more elegant solution than simple ensemble techniques
Rhicheek Patra, Oracle Labs (ML Summit 2018) 41
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Summary
• Use graph signals (relationships between entities) for data analysis
• Applying graph algorithms
• Combining graph with Machine learning – embedding techniques
• Many applications
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• We have an implementation in our graph
package (PGX) [BETA only, not in product]
– Load graph model
– Compute graph embedding
– Query embedding directly on graph
– Export graph embedding
Rhicheek Patra, Oracle Labs (ML Summit 2018) 43
Sounds complicated, how can I use this technique easily?
PGX
(Graph)
Database or
Files
Embedding
Export
Other ML
Framework
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
44
Resources
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Resources
• Oracle Spatial and Graph on OTN
www.oracle.com/technetwork/database/options/spatialandgraph
White papers, software downloads, documentation and videos
• Use cases and examples at OpenWorld ’18 Graph presentations page:
https://tinyurl.com/GraphOOW18
• Blog – examples, tips & tricks blogs.oracle.com/oraclespatial
• YouTube channel: https://tinyurl.com/OracleGraphYouTube
• Oracle Big Data Lite Virtual Machine - a free sandbox to get started
www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html
– Hands On Lab included in /opt/oracle/oracle-spatial-graph/ or http://github.com/oracle/BigDataLite/
45
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Resources – social media and online communities
• Follow the product team: @SpatialHannes, @JeanIhm, @agodfrin
• Oracle Spatial and Graph SIG user groups (search “Oracle Spatial and
Graph Community”)
46
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Analytics and Data Summit
All Analytics. All Data. No Nonsense.
March 12 – 14, 2019
Formerly called the BIWA Summit with the Spatial and Graph Summit
Same great technical content…new name!
www.AnalyticsandDataSummit.org
Call for Speakers now open!
Submit an abstract to share your use
case or technical session by Jan. 7
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
AskTOM sessions on property graphs
• Next Spatial and Graph session in Jan/Feb
– Topic to be announced – stay tuned
• View recordings, submit feedback, questions,
topic requests, view upcoming session dates and
topics, sign up to get regular updates
48
https://devgym.oracle.com/pls/apex/dg/office_hours/3084
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
49
Thanks for attending! See you next time.
https://devgym.oracle.com/pls/apex/dg/office_hours/3084
When Graphs Meet Machine Learning

More Related Content

What's hot

Oracle Advanced Analytics
Oracle Advanced AnalyticsOracle Advanced Analytics
Oracle Advanced Analytics
aghosh_us
 
Graph Data Modeling in DataStax Enterprise
Graph Data Modeling in DataStax EnterpriseGraph Data Modeling in DataStax Enterprise
Graph Data Modeling in DataStax Enterprise
Artem Chebotko
 

What's hot (20)

An Introduction to Graph: Database, Analytics, and Cloud Services
An Introduction to Graph:  Database, Analytics, and Cloud ServicesAn Introduction to Graph:  Database, Analytics, and Cloud Services
An Introduction to Graph: Database, Analytics, and Cloud Services
 
Pivotal OSS meetup - MADlib and PivotalR
Pivotal OSS meetup - MADlib and PivotalRPivotal OSS meetup - MADlib and PivotalR
Pivotal OSS meetup - MADlib and PivotalR
 
A gentle introduction to Oracle R Enterprise
A gentle introduction to Oracle R EnterpriseA gentle introduction to Oracle R Enterprise
A gentle introduction to Oracle R Enterprise
 
Custom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDBCustom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDB
 
OracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraph
OracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraphOracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraph
OracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraph
 
Oracle Advanced Analytics
Oracle Advanced AnalyticsOracle Advanced Analytics
Oracle Advanced Analytics
 
Graph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise GraphGraph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise Graph
 
Big Analytics Without Big Hassles
Big Analytics Without Big HasslesBig Analytics Without Big Hassles
Big Analytics Without Big Hassles
 
MySQL Optimizer: What's New in 8.0
MySQL Optimizer: What's New in 8.0MySQL Optimizer: What's New in 8.0
MySQL Optimizer: What's New in 8.0
 
An Introduction to Spark with Scala
An Introduction to Spark with ScalaAn Introduction to Spark with Scala
An Introduction to Spark with Scala
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
 
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
 
Apache HAWQ and Apache MADlib: Journey to Apache
Apache HAWQ and Apache MADlib: Journey to ApacheApache HAWQ and Apache MADlib: Journey to Apache
Apache HAWQ and Apache MADlib: Journey to Apache
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
 
OrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databasesOrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databases
 
Fast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareFast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA Hardware
 
Massively Scalable Computational Finance with SciDB
 Massively Scalable Computational Finance with SciDB Massively Scalable Computational Finance with SciDB
Massively Scalable Computational Finance with SciDB
 
How to Integrate Spark MLlib and Apache Solr to Build Real-Time Entity Type R...
How to Integrate Spark MLlib and Apache Solr to Build Real-Time Entity Type R...How to Integrate Spark MLlib and Apache Solr to Build Real-Time Entity Type R...
How to Integrate Spark MLlib and Apache Solr to Build Real-Time Entity Type R...
 
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
 
Graph Data Modeling in DataStax Enterprise
Graph Data Modeling in DataStax EnterpriseGraph Data Modeling in DataStax Enterprise
Graph Data Modeling in DataStax Enterprise
 

Similar to When Graphs Meet Machine Learning

AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine LearningAUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
Sandesh Rao
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
javed75
 

Similar to When Graphs Meet Machine Learning (20)

LAD -GroundBreakers-Jul 2019 - Introduction to Machine Learning - From DBA's ...
LAD -GroundBreakers-Jul 2019 - Introduction to Machine Learning - From DBA's ...LAD -GroundBreakers-Jul 2019 - Introduction to Machine Learning - From DBA's ...
LAD -GroundBreakers-Jul 2019 - Introduction to Machine Learning - From DBA's ...
 
LAD -GroundBreakers-Jul 2019 - The Machine Learning behind the Autonomous Dat...
LAD -GroundBreakers-Jul 2019 - The Machine Learning behind the Autonomous Dat...LAD -GroundBreakers-Jul 2019 - The Machine Learning behind the Autonomous Dat...
LAD -GroundBreakers-Jul 2019 - The Machine Learning behind the Autonomous Dat...
 
Fraud Detection in Financial Services using Graph Analysis and Machine Learning
Fraud Detection in Financial Services using Graph Analysis and Machine LearningFraud Detection in Financial Services using Graph Analysis and Machine Learning
Fraud Detection in Financial Services using Graph Analysis and Machine Learning
 
AIOUG -GroundBreakers-Jul 2019 - Introduction to Machine Learning - From DBA'...
AIOUG -GroundBreakers-Jul 2019 - Introduction to Machine Learning - From DBA'...AIOUG -GroundBreakers-Jul 2019 - Introduction to Machine Learning - From DBA'...
AIOUG -GroundBreakers-Jul 2019 - Introduction to Machine Learning - From DBA'...
 
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine LearningAUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
 
Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial Intelligence
 
Automated Analytics at Scale
Automated Analytics at ScaleAutomated Analytics at Scale
Automated Analytics at Scale
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
 
Introducing new AIOps innovations in Oracle 19c - San Jose AICUG
Introducing new AIOps innovations in Oracle 19c - San Jose AICUGIntroducing new AIOps innovations in Oracle 19c - San Jose AICUG
Introducing new AIOps innovations in Oracle 19c - San Jose AICUG
 
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
 
Data meets AI - AICUG - Santa Clara
Data meets AI  - AICUG - Santa ClaraData meets AI  - AICUG - Santa Clara
Data meets AI - AICUG - Santa Clara
 
Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
 
Oracle Data Science Platform
Oracle Data Science PlatformOracle Data Science Platform
Oracle Data Science Platform
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
 
Knowledge Discovery
Knowledge DiscoveryKnowledge Discovery
Knowledge Discovery
 
Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...
Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...
Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...
 
Applied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML modelApplied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML model
 
20181123 dn2018 graph_analytics_k_patenge
20181123 dn2018 graph_analytics_k_patenge20181123 dn2018 graph_analytics_k_patenge
20181123 dn2018 graph_analytics_k_patenge
 
Using Machine Learning to Understand and Predict Marketing ROI
Using Machine Learning to Understand and Predict Marketing ROIUsing Machine Learning to Understand and Predict Marketing ROI
Using Machine Learning to Understand and Predict Marketing ROI
 

Recently uploaded

如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
zifhagzkk
 
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
a8om7o51
 
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
yulianti213969
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
23050636
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
great91
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
Amil baba
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Stephen266013
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
pwgnohujw
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
ppy8zfkfm
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Valters Lauzums
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
fztigerwe
 

Recently uploaded (20)

如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
 
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 

When Graphs Meet Machine Learning

  • 1. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Oracle Ask TOM Office Hours: When Graphs Meet Machine Learning Sungpack Hong, Research Director, Oracle Labs Jean Ihm, Product Manager @JeanIhm
  • 2. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | AskTOM sessions on property graphs • Today’s is the sixth session on property graphs • In our past sessions, we showed … – An introduction to Property Graphs, how to model graphs from relational data, perform graph analytics, visualize graphs, query graphs • Today’s topic: When Graphs Meet Machine Learning (use cases) • Visit the Spatial and Graph landing page to view recordings of past sessions; submit feedback, questions, topic requests; view upcoming session dates and topics; sign up 2 https://devgym.oracle.com/pls/apex/dg/office_hours/3084
  • 3. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 3 The Story So Far …
  • 4. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Big Data Spatial and Graph • Available for Big Data platform/BDCS – Hadoop, HBase, Oracle NoSQL • Supported both on BDA and commodity hardware – CDH and Hortonworks • Database connectivity through Big Data Connectors or Big Data SQL • Included in Big Data Cloud Service Oracle Spatial and Graph • Available with Oracle 18c/12.2/DBCS • Using tables for graph persistence • Graph views on relational data • In-database graph analytics – Sparsification, shortest path, page rank, triangle counting, WCC, sub graphs • SQL queries possible • Included in Database Cloud Service 4 Graph Product Options
  • 5. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Graph Storage Management 5 Architecture of Property Graph Graph Analytics Blueprints/Tinkerpop/Gremlin REST/WebService APIs Java,Groovy,Python,… Scalable and Persistent Storage Oracle NoSQL Database Oracle RDBMS Apache HBase Parallel In-Memory Graph Analytics pgql> SELECT ...
  • 6. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Graph Analysis for Business Insight 6 Identify Influencers Discover Graph Patterns in Big Data Generate Recommendations
  • 7. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 7 When Graphs Meet Machine Learning: Use Cases
  • 8. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Graph Benefits • Why do we want to have graph as data model (as opposed to relational) • Some of graph benefits – Intuitive data model – Fast query over multi-hop relationships – Data visualization and interactive exploration – Enhanced data analysis via graph signals
  • 9. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Using Graph for Data Analysis • The main idea – Graph captures fine-grained relationships (as edges) between data entities – By using these (materialized) relationships as new signals, – We can extract some useful information about the original data set – … but exactly how?
  • 10. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Agenda Classical Method: Via Graph Algorithms Generating Features From Graph Algorithms Graph Embedding Techniques 1 2 3 10
  • 11. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Classical Approach: Graph Algorithms • Starting from graph data model • Apply computational graph algorithms (+ graph queries) – E.g. centrality, reachability, closeness, … – Graph algorithm computes specific characteristics of the graph model – Use the algorithm results to get answer for your question • e.g. what entities are closer to other entities?
  • 12. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Example – Anomaly Detection in Medicare Data • Using a Public Dataset • From US Center for Medicare and Medicaid Services (CMS) – Health-care Billing Data for CY 2012 – Aggregated medical transactions: 9,153,272 records with 29 variables – Transactions between 880,644 medical providers and CMS with total amounts > $77B for the year • Data Entities – Medical providers (doctors) – Medical procedures – operations, prescription, treatments … – i.e. Who is doing what (and charging it to Medicare)
  • 13. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | • Anomaly Definition – Doctors of the same specialty provide similar services – What if a doctor perform a lot of treatments that typically belong to other specialties? • E.g. a cardiologist doing plastic surgery? • How do we find such cases?  By applying graph algorithm Anomaly and Graph Data Model “There is a spy among us” Dr. Frankenstein, Prescribe Aspirin Optometrists CMS data represented as a graph providers services Approach (sketch) • Pick a specialty • Compute random-walk distance from the doctors of this specialty group • (By applying personalized Pagerank algorithm) • And check if there is any outside-specialty doctors who are exceptionally close to this specialty group
  • 14. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | • False positives – From popular common procedures Procedures Subsequent hospital inpatient care, typically 25 minutes per day Emergency department visit, moderately severe problem Initial hospital inpatient care, typically 70 minutes per day Subsequent hospital inpatient care, typically 35 minutes per day Initial hospital inpatient care, typically 50 minutes per day … Dealing with practical issues (1/2) What Eye-doctor does What every Doctor does Doctor XDoctor Y • We can identify such procedures via another graph algorithm – Pagerank • And do special treatment (details omitted here)
  • 15. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Dealing with practical issues (2/2) • False positives – From close specialties – Who does similar thing by nature • What we did (details omitted here) – Statistically identify such specialties – Treat them differently Blue: PPR distribution of Optometrists Red: PPR distribution of Opthalmologists Eye doctor Eyeball doctor
  • 16. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Results • Example of detected anomaly Description of the procedure ID Specialty Removal of eye fluid (vitreous) between the lens and retina 1760485xx6 Gastroenterology Preventive retinal detachment treatment by heat or laser 1760485xx6 Gastroenterology Removal of membrane from the retina … 1760485xx6 Gastroenterology Stomach doctor A stomach doctor doing eye doctor things (and charging those to CMS)
  • 17. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Other Example -- Bot accounts in SNS network • Identifying Bot Accounts in SNS network – Represent SNS network as Graph – Apply graph algorithm and query – Find bots and their targets which show different communication patterns – Find other bots that are connected via the network
  • 18. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Algorithmic Approach – Summary and Issues • Apply graph algorithms for data analysis • Still very effective for many problems and applications – Anomaly detection, Influence identification, Customer segmentation, Topic analysis … • Explainability Results based on (deterministic) algorithms computation • Issues – You need to know what algorithm can solve your problem – Want to follow the machine learning (ML) trends • Exploit existing techniques and tools in ML • Combine with other ML models
  • 19. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Feeding graph data into ML pipeline • Goal: want to apply Machine Learning techniques using graph signals Need some form that are suitable for feeding into conventional ML pipeline but still carries the information in the graph • … How can this be done? Raw Data ML Model graph ……
  • 20. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Agenda Classical Method: Via Graph Algorithms Generating Features From Graph Algorithms Graph Embedding Techniques 1 2 3 20
  • 21. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | • Approach – Compute (various) graph algorithms to generate some numeric values – Feed the output of graph algorithms into ML model • Rationale – Each graph algorithm result contains certain characteristics of the graph data – Combination of those result would keep information about the graph structure Feature Generation via Graph Algorithm ML Model Raw Data Graph View Apply graph algorithms
  • 22. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Example from Security Application • Security is an important topic, especially for Cloud – Identity management – Threat Protection – Security management – ... • There are many interesting success stories for using graph for security problems – Cisco, MS, Amazon, …
  • 23. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Problem: Malware Detection from Network • Goal – Analyze network packet captures (PCAPs) • Traces from malware activities and benign ones – Apply machine learning technique – Leann a model to distinguish malwares activities from benign ones.
  • 24. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Graph-Based Approach • Approach (from DSN’17) – Rather than analyzing payload of each packet, collect up packet traces as graphs – Extract characteristics of those graphs via graph algorithms – Train model  Differentiate malware traces from normal ones Angler EK serving CryptoWall ransomware on 12/21//2015DSN’17 Observation: Malwares have different trace patterns than normal activities
  • 25. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | • Implementations – Implemented the technique in the paper on top of PGX – Features generation as proposed in the pager: node-size, edge-size, avg-degree, avg-eigen- vector, avg-pagerank, avg-clustering coeff, … – PGX to generate features; ML framework (e.g. TensorFlow) for classification – 510 malware pcaps from https://www.malware-traffic-analysis.net/ – 1110 benign pcaps from https://www.wireshark.org/download/automat ed/captures/ Example in Security Application -- Malware Detection • Data Set : Trace Graph Size Distribution Number of vertices (malware pcaps) Number of vertices (benign pcaps) Number of edges (malware pcaps) Number of edges (benign pcaps)
  • 26. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | • Feature importance (using Random Forests) • Avg-In-Degree (0.191787) • Avg-Out-Degree (0.183539) • Avg-Degree-Centrality (0.170083) • Avg-EigenVector-Centrality (0.106156) • Volume (0.088323) • EdgeSize (0.077220) • Avg-PageRank (0.074146) • NodeSize (0.064597) • Avg-Betweenness-Centrality (0.039565) • Avg-Clustering-Coefficient (0.004584) • (Demo) • Accuracy on test set: 100% (Random Forests), 99.69% (CNN) • Confusion matrix: (tn: 224, fp: 0, fn: 0, tp: 100) Confidential – Oracle Internal/Restricted/Highly Restricted 26 Exploration Result
  • 27. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Agenda Classic Method: Applying Graph Algorithms Generating Features From Graph Algorithms Graph Embedding Techniques 1 2 3 27
  • 28. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation. 28
  • 29. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Notice • Contents in this section discuss recent techniques for combining graphs and machine learning • Some of these techniques are experimentally implemented in Oracle Labs • Therefore they are not part of the Oracle Graph products
  • 30. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Machine Learning and Graphs (revisit) • Still there are issues – Applied seemingly arbitrary set of algorithms for extracting features – Would it work for other applications?  Need a systematic methodology that turns graph information into n-dimensional numeric representation, i.e. embedding We discuss two embedding techniques: vertex embedding and graph embedding Raw Data ML Model Graph Representation Numeric Representation (N-dimensional vector) per vertex Raw Data ML Model Multiple Graph Representations Numeric Representation (N-dimensional vector) per graph Vertex Embedding Graph(let) Embedding
  • 31. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Vertex Embedding • Goal: to turn graph into n-dimensional vector • Want to keep graph (topology) information – i.e. Entity distance in distance Rhicheek Patra, Oracle Labs (ML Summit 2018) 31 Raw Data ML Model Graph Representation Numeric Representation (N-dimensional vector) x, y: data entity (represented as vertex in graph) v(x), v(y): n- dimensional vector representation of x and y x, y close in graph  v(x) - v(y) close in vector space
  • 32. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | • There are several approaches now – Academia and Industry • DeepWalk – An early approach that exploits techniques from modern NLP – Word2Vec : a ML technique that learns closeness between words from large number of sentences – Perform many random walks on the graph and generate traces. – Apply W2V technique on them; treating vertices as words. Rhicheek Patra, Oracle Labs (ML Summit 2018) 32 How to achieve this? KDD‘14
  • 33. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | • Note: you can consider this as a mock-up of customer segmentation problem – Student => Customer – Course taking => Item or service purchase – Department => Segmentation label Example • Student classification – A real dataset from university – Can you predict a student’s major or department just by looking at the classmates in the course that (s)he is taking? Rhicheek Patra, Oracle Labs (ML Summit 2018) 33 CS ME 10.003 10.004 10.005 11.103 11.213 12.118 students courses
  • 34. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | • (Result #1) Graph-based prediction gives better result than naïve application of ML (e.g. CNN) on basic student features (e.g. age, gender, background, …) • (Result #2) Deep-Walk preserves information from graph representation • (Result #3) Deep-Walk allows to combined graph data with other features Results CNN on Original Features PPR (Graph Algorithm) CNN on Extracted Graph Features (from deep-walk) CNN on Original + Graph Features
  • 35. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Graph(let) Embedding • Objective – Want to capture closeness (similarity) between graph instances • Example – What classic literatures look more similar. from their character relationships? Odyssey Beowulf Romeo & Juliet Hamlet Want to have a systematic way to extract features of these graphs and compare/classify them Need to capture irregular structure: • Arbitrary edges between vertices • Varying size (# of vertices and #edges) • Labels or properties defined on vertices and edges
  • 36. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | • Similarly applying sequence-based learning – i.e: Paragraph2Vec  learn from large text corpus. Paragraphs composed of similar words are close in embedding space – Consider each graph as paragraph – Generate random-walk on each graphs to generate traces. – Apply Paragraph2Vec model to learn similarity between graphs • Added our own improvements Graph(let) Embedding – How to approach this? (1) Tricks for encoding multiple properties (2) When generating traces, consider edges as words (instead of vertices) (3) Considering certain global properties of each graph -- e.g. size of graph …
  • 37. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Graph(let) Embedding – Cheminformatics Example • (Demo) • Two Datasets from cheminformatics – National Cancer Institute (NCI109) • #Graphs: 4127 • #Vertices: ranges from 35 to 111 • #Edges: ranges from 152 to 476 • Cancer types (binary classification) – Proteins • #Graphs: 1113 • #Vertices: ranges from 9 to 620 • #Edges: ranges from 64 to 4048 • Protein types (binary classification) PG2VEC (ours) PG2VEC (ours) Notes • Graph2Vec is naïve implementation of paragraph2vec on graph • Our improvements made a big difference in quality of answer for both dataset
  • 38. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | • AML (Anti-money laundering) – More serious application from collaboration with FCCM team – Online Step • Transactions are monitored real-time • Any suspicious activity is flagged (lots of false-positives) – Offline Step (Correlation) • Flags(alerts) are attached to the global financial graph • Identify entities and flags that are closed connected from the graph •  Each subgraph creates a case Other Example – Anti-Money Laundering AlertA AlertB AlertA AlertB AlertC Case 1 Case 2 AlertA AlertA Case 3 – Evaluation Step • Compute certain functions on each case to evaluate its risk factor • Human (investigator) makes decision – Looks serious  Proceed to official investigation – Looks benign  close the case – Don’t know yet. Keep it open  wait to see if more flags will happen Algorithmic approach Graph ML to help making this decision?
  • 39. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | • Task #1 – Given a new case, can we find existing cases in history that look similar to this one? (as a reference for investigator) : Use Pg2Vec to train and find Graph(let) Embedding – Anti-Money Laundering • Task #2 – Train from existing cases, learn a classifier – i.e. system recommends that “this case looks serious. Recommended for official investigation”
  • 40. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Graphs Machine Learning: Recent Trends • By the way, combining graph and machine learning is a trend – Many in industry and academia are looking at this problem – And applying it to solving real problems 40 Pinterest Alibaba Google
  • 41. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Current Directions • Improving Scalability – Increasing the size of graph (e.g. tens of billions of vertices) • Combining structure (relationship) and other raw observation – E.g. Item attributes + Co-purchase Information – Finding more elegant solution than simple ensemble techniques Rhicheek Patra, Oracle Labs (ML Summit 2018) 41
  • 42. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Summary • Use graph signals (relationships between entities) for data analysis • Applying graph algorithms • Combining graph with Machine learning – embedding techniques • Many applications
  • 43. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | • We have an implementation in our graph package (PGX) [BETA only, not in product] – Load graph model – Compute graph embedding – Query embedding directly on graph – Export graph embedding Rhicheek Patra, Oracle Labs (ML Summit 2018) 43 Sounds complicated, how can I use this technique easily? PGX (Graph) Database or Files Embedding Export Other ML Framework
  • 44. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 44 Resources
  • 45. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Resources • Oracle Spatial and Graph on OTN www.oracle.com/technetwork/database/options/spatialandgraph White papers, software downloads, documentation and videos • Use cases and examples at OpenWorld ’18 Graph presentations page: https://tinyurl.com/GraphOOW18 • Blog – examples, tips & tricks blogs.oracle.com/oraclespatial • YouTube channel: https://tinyurl.com/OracleGraphYouTube • Oracle Big Data Lite Virtual Machine - a free sandbox to get started www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html – Hands On Lab included in /opt/oracle/oracle-spatial-graph/ or http://github.com/oracle/BigDataLite/ 45
  • 46. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Resources – social media and online communities • Follow the product team: @SpatialHannes, @JeanIhm, @agodfrin • Oracle Spatial and Graph SIG user groups (search “Oracle Spatial and Graph Community”) 46
  • 47. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Analytics and Data Summit All Analytics. All Data. No Nonsense. March 12 – 14, 2019 Formerly called the BIWA Summit with the Spatial and Graph Summit Same great technical content…new name! www.AnalyticsandDataSummit.org Call for Speakers now open! Submit an abstract to share your use case or technical session by Jan. 7
  • 48. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | AskTOM sessions on property graphs • Next Spatial and Graph session in Jan/Feb – Topic to be announced – stay tuned • View recordings, submit feedback, questions, topic requests, view upcoming session dates and topics, sign up to get regular updates 48 https://devgym.oracle.com/pls/apex/dg/office_hours/3084
  • 49. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 49 Thanks for attending! See you next time. https://devgym.oracle.com/pls/apex/dg/office_hours/3084

Editor's Notes

  1. Oracle provides two products for graph data management and analysis: Big Data Spatial and Graph (BDSG) Oracle Spatial and Graph Customers have been requesting to develop graph based applications on either platform, based on their underlying business requirements. Application developers have a choice of which platform to develop a graph solution.
  2. Graph features allow you, among many other things, to efficiently: *extract implicit information from your data using graph analytics *discover graph patterns in big data, such as communities and influencers *generate recommendations based on interests, profiles, and past behaviors
  3. http://slc15bna.us.oracle.com:7008/?root=notebooks&notebook=dsw6Yq44w
  4. Odyssey, Beowulf are more star-shaped, characters have relationship around the main character; while Hamlet has more interactions between sub characters
  5. Here are various resources for more information on the Big Data Spatial and Graph product. Our product pages include data sheets, trial downloads, documentation The Big Data Lite VM is a free sandbox environment that you can download to quickly get started using Oracle’s Big Data platform components including Big Data Spatial and Graph, Oracle Database, and several other technologies – also a great way to get your feet wet You can also find a Hands on Lab where you can work with the vector and raster features we’ve shown today Check out the blog for examples and code samples We’re on social media at these handles Finally, if you are planning to attend the Oracle OpenWorld 2016 conference in San Francisco this fall, we’ll have a number of sessions, labs, and demos around the big data and cloud technologies.
  6. We’d like to mention an upcoming conference that may be of interest. The Analytics and Data Summit (formerly BIWA) will be held with the Oracle Spatial & Graph Summit, in March 2019 at Oracle’s headquarters in Redwood Shores, will include technical content spanning Big Data, Analytics, Spatial & Graph, Cloud, and IoT technologies. This is the premier event for Spatial + Graph, featuring thought leaders, and experts from Oracle as well as our partner and customer community worldwide. The agenda includes technical sessions, and hands on labs for you to get deep dives and work with the technologies, and customer use cases. The call for speakers is now open, and presentations are being selected. We encourage you to submit your use case studies highlighting Oracle’s spatial, graph, analytics, cloud, big data technologies for consideration. We invite you to consider joining us for this event. Call for speakers and registration are now open at www.analyticsanddatasummit.org . A more detailed list of sessions + speakers is available at the event website.