Graph Database
Modeling for Kaggle
COVID-19 Dataset
MedFuse
JONATHAN HERKEJune 05, 2020
1
© 2020 TigerGraph. All Rights Reserved
Introduction
2
Jonathan Herke
Developer/Technical Evangelist
- BS in Computer Information Systems
- Founder of Futurist Academy (non-profit)
- 8 years Computer Networking Engineer
in the Army National Guard
- 6+ years building healthcare solutions as
an EIR and Graph Engineer
© 2020 TigerGraph. All Rights Reserved
TigerGraph Summary
Helping our customers improve the world with deeper insights
• The only scalable graph database for the enterprise
• Foundational for AI and ML solutions
• Designed for efficient concurrent OLTP and OLAP workloads
• SQL-like query language (GSQL) accelerates developer time to solution
Our customers
Founded in 2012 with HQ in Redwood City, California
© 2020 TigerGraph. All Rights Reserved
Why Graph and Why Now
4
Graph enables organizations of all sizes to
ask their best questions of connected data
Graph integrates data from multiple internal
and increasingly diverse external sources
Larger, richer, and more diverse datasets
mean more entities can be analyzed as
more connections are understood
According to Gartner, over the next five
years the graph analytics market will see
100% cumulative annual growth
© 2020 TigerGraph. All Rights Reserved
Why Graph and Why Now
5
In-Graph
Analytics,
Transactions,
+ AI/ML at scale
© 2020 TigerGraph. All Rights Reserved
Why Graph Analytics and Why Not RDBMS?
6
Customer
Supplier
Location 2
Product
Payment
PURCHASED
RESIDES
SHIPS
TO
PURCHASED
SHIPS FROM
A
C
C
EPTED
MAKES
Location 1
N
O
TIFIES
Order
Graph databases store relationships with entities.
Unlike RDBMS, graph analytics with ML/AI provide
unique insights into connected data for key use
cases:
• Customer 360/member journey
• Anti-fraud/anti-money laundering
• Supply chain optimization
Graph answers the best questions about
connected data with fast, deep, and wide insights
Source: Gartner - Top 10 Data and Analytics Trends for 2019
https://tinyurl.com/y2slxtq7
© 2020 TigerGraph. All Rights Reserved
Google: Knowledge Graph
7
https://googleblog.blogspot.com/2012/05/introducing-knowledg
e-graph-things-not.html
© 2020 TigerGraph. All Rights Reserved
Use Cases
8
● Seamlessly integrate
multiple sources of
data to provide
unified and
comprehensive view
for each member
● Find similar
members with a
click of a button in
real-time
● Deliver care path
recommendations
for similar members
© 2020 TigerGraph. All Rights Reserved
Use Cases
9
© 2020 TigerGraph. All Rights Reserved
Use Cases
10
© 2020 TigerGraph. All Rights Reserved
TigerGraph Platform Overview
Features Design Differences Benefits
Real-Time Deep-Link Queries ● Native parallel graph for correctness and efficiency
● C++ engine for category-leading performance
● Massively parallel processing for cloud-scale
In-graph concurrent transactions, deep-link analytics, and
AI/ML at scale
Massive Scale ● Distributed architecture with continuous availability
● Efficient graph storage reduces memory footprint
● All enterprise and external datasets supported
● Automatic partitioning and active-active HA
In-Database Analytics GSQL: Turing-complete SQL-like query language enables
unique analytics features such as accumulators and
user-extensible graph algorithm library
Strong consistency and graph analytics in a single logical
cluster even across multiple datacenters simplifies
deployment and accelerates time to insight
What’s New in v3.0? ● No-code migration from RDBMS
● No-code Visual Query Builder
● User-configurable indexing
● Faster deployment across distributed infrastructures
● All the clouds (TGcloud on AWS, Azure, soon GCP)
● Democratize self-service graph analytics to derive
new insights from legacy/external data stores
● Scale dynamic environments from on-premise to all
the clouds
5 to 10+ hops deep
© 2020 TigerGraph. All Rights Reserved
TigerGraph Platform Overview
Graph Storage Engine (GSE)
Graph Processing Engine
(GPE)
Parallel Query
Processing
Data
Snapshots
GSQL
Queries
Visual
Design UI
RESTful
APIs
Input
Data
Operational Data
Master Data
DBs
Spark
Kafka
Files
Business
Intelligence
Analytics
Visualization
Dashboards
Reports
Data
Warehouses
Master Data
Stores
Machine
Learning
ETL Data
Loader
User queries,
graph
algorithms
GSQL
Server
Graph-
Studio
Server
Graph Data
Storage ID ServiceIndexing
Message Queuing
(Spark / Kafka
Zookeeper)
RESTPP
© 2020 TigerGraph. All Rights Reserved
13
GRAPH
Clustering
Betweenness
Similarity
Degree
Page Rank
Recommend
Shortest Path
Connected
Centrality
Detection
ML /
Data Science
Graph
Convolutional
Networks (GCN)
Temporal
Pattern Detect
Louvain
Dependency
Networks (RPN)
Markov
Networks (RDN)
Probabilistic
Models (PRM)
Graph Algorithms + ML
https://www.geeksforgeeks.org/graph-data-structure-and-algorithms/
© 2020 TigerGraph. All Rights Reserved 14
© 2020 TigerGraph. All Rights Reserved 15
© 2020 TigerGraph. All Rights Reserved 16
Documents
Of Data
Run Machine Learning
On text (NLP)
Modeling for Kaggle COVID-19 Dataset
GSQL Queries (algos)
Semantically Link
© 2020 TigerGraph. All Rights Reserved
Links
17
Workshop Files on the Cloud
● Data
○ Sample.csv
○ normalizedAuthors.csv
○ Entity_bc5cdr.csv
○ Entity_bionlp13cg.csv
○ Entity_craft
○ Entity_jnlpba
● Export_1588865407302.tar.gz
● Sample Queries
https://tgcloud.io
Named Entity Recognition
Akash Kaul is a rising junior at Washington
University in St. Louis studying computer
science on the pre-med track. His passion is
exploring the intersection of technology and
medicine, including informatics, AI, and
healthcare databases.
© 2020 TigerGraph. All Rights Reserved
Demo
18
Download(Link In Chat)
Data Folder
© 2020 TigerGraph. All Rights Reserved
Demo
19
Go To(Link In Chat)
https://tgcloud.io
© 2020 TigerGraph. All Rights Reserved
Demo
20
Go To(Link In Chat)
Named Entity Recognition
© 2020 TigerGraph. All Rights Reserved
Demo
21
Go To(Link In Chat)
https://tgcloud.io
© 2020 TigerGraph. All Rights Reserved 22
THANK YOU!
© 2020 TigerGraph. All Rights Reserved 23
https://www.linkedin.com/in/jonherke/
Jonathan Herke
TigerGraph
Developer/Technical
Evangelist
Connect
© 2020 TigerGraph. All Rights Reserved
Resources
24
Developer Community
• TG Community Forum community.tigergraph.com
• TG Community Chat discord.gg/F2c9b9v
• Reddit reddit.com/r/tigergraph/
• YouTube youtube.com/tigergraph
• LinkedIn linkedin.com/company/tigergraph/
• Twitter twitter.com/tigergraphdb
• Twitch twitch.tv/tigergraph
• GitHub github.com/tigergraph/ecosys

Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset

  • 1.
    Graph Database Modeling forKaggle COVID-19 Dataset MedFuse JONATHAN HERKEJune 05, 2020 1
  • 2.
    © 2020 TigerGraph.All Rights Reserved Introduction 2 Jonathan Herke Developer/Technical Evangelist - BS in Computer Information Systems - Founder of Futurist Academy (non-profit) - 8 years Computer Networking Engineer in the Army National Guard - 6+ years building healthcare solutions as an EIR and Graph Engineer
  • 3.
    © 2020 TigerGraph.All Rights Reserved TigerGraph Summary Helping our customers improve the world with deeper insights • The only scalable graph database for the enterprise • Foundational for AI and ML solutions • Designed for efficient concurrent OLTP and OLAP workloads • SQL-like query language (GSQL) accelerates developer time to solution Our customers Founded in 2012 with HQ in Redwood City, California
  • 4.
    © 2020 TigerGraph.All Rights Reserved Why Graph and Why Now 4 Graph enables organizations of all sizes to ask their best questions of connected data Graph integrates data from multiple internal and increasingly diverse external sources Larger, richer, and more diverse datasets mean more entities can be analyzed as more connections are understood According to Gartner, over the next five years the graph analytics market will see 100% cumulative annual growth
  • 5.
    © 2020 TigerGraph.All Rights Reserved Why Graph and Why Now 5 In-Graph Analytics, Transactions, + AI/ML at scale
  • 6.
    © 2020 TigerGraph.All Rights Reserved Why Graph Analytics and Why Not RDBMS? 6 Customer Supplier Location 2 Product Payment PURCHASED RESIDES SHIPS TO PURCHASED SHIPS FROM A C C EPTED MAKES Location 1 N O TIFIES Order Graph databases store relationships with entities. Unlike RDBMS, graph analytics with ML/AI provide unique insights into connected data for key use cases: • Customer 360/member journey • Anti-fraud/anti-money laundering • Supply chain optimization Graph answers the best questions about connected data with fast, deep, and wide insights Source: Gartner - Top 10 Data and Analytics Trends for 2019 https://tinyurl.com/y2slxtq7
  • 7.
    © 2020 TigerGraph.All Rights Reserved Google: Knowledge Graph 7 https://googleblog.blogspot.com/2012/05/introducing-knowledg e-graph-things-not.html
  • 8.
    © 2020 TigerGraph.All Rights Reserved Use Cases 8 ● Seamlessly integrate multiple sources of data to provide unified and comprehensive view for each member ● Find similar members with a click of a button in real-time ● Deliver care path recommendations for similar members
  • 9.
    © 2020 TigerGraph.All Rights Reserved Use Cases 9
  • 10.
    © 2020 TigerGraph.All Rights Reserved Use Cases 10
  • 11.
    © 2020 TigerGraph.All Rights Reserved TigerGraph Platform Overview Features Design Differences Benefits Real-Time Deep-Link Queries ● Native parallel graph for correctness and efficiency ● C++ engine for category-leading performance ● Massively parallel processing for cloud-scale In-graph concurrent transactions, deep-link analytics, and AI/ML at scale Massive Scale ● Distributed architecture with continuous availability ● Efficient graph storage reduces memory footprint ● All enterprise and external datasets supported ● Automatic partitioning and active-active HA In-Database Analytics GSQL: Turing-complete SQL-like query language enables unique analytics features such as accumulators and user-extensible graph algorithm library Strong consistency and graph analytics in a single logical cluster even across multiple datacenters simplifies deployment and accelerates time to insight What’s New in v3.0? ● No-code migration from RDBMS ● No-code Visual Query Builder ● User-configurable indexing ● Faster deployment across distributed infrastructures ● All the clouds (TGcloud on AWS, Azure, soon GCP) ● Democratize self-service graph analytics to derive new insights from legacy/external data stores ● Scale dynamic environments from on-premise to all the clouds 5 to 10+ hops deep
  • 12.
    © 2020 TigerGraph.All Rights Reserved TigerGraph Platform Overview Graph Storage Engine (GSE) Graph Processing Engine (GPE) Parallel Query Processing Data Snapshots GSQL Queries Visual Design UI RESTful APIs Input Data Operational Data Master Data DBs Spark Kafka Files Business Intelligence Analytics Visualization Dashboards Reports Data Warehouses Master Data Stores Machine Learning ETL Data Loader User queries, graph algorithms GSQL Server Graph- Studio Server Graph Data Storage ID ServiceIndexing Message Queuing (Spark / Kafka Zookeeper) RESTPP
  • 13.
    © 2020 TigerGraph.All Rights Reserved 13 GRAPH Clustering Betweenness Similarity Degree Page Rank Recommend Shortest Path Connected Centrality Detection ML / Data Science Graph Convolutional Networks (GCN) Temporal Pattern Detect Louvain Dependency Networks (RPN) Markov Networks (RDN) Probabilistic Models (PRM) Graph Algorithms + ML https://www.geeksforgeeks.org/graph-data-structure-and-algorithms/
  • 14.
    © 2020 TigerGraph.All Rights Reserved 14
  • 15.
    © 2020 TigerGraph.All Rights Reserved 15
  • 16.
    © 2020 TigerGraph.All Rights Reserved 16 Documents Of Data Run Machine Learning On text (NLP) Modeling for Kaggle COVID-19 Dataset GSQL Queries (algos) Semantically Link
  • 17.
    © 2020 TigerGraph.All Rights Reserved Links 17 Workshop Files on the Cloud ● Data ○ Sample.csv ○ normalizedAuthors.csv ○ Entity_bc5cdr.csv ○ Entity_bionlp13cg.csv ○ Entity_craft ○ Entity_jnlpba ● Export_1588865407302.tar.gz ● Sample Queries https://tgcloud.io Named Entity Recognition Akash Kaul is a rising junior at Washington University in St. Louis studying computer science on the pre-med track. His passion is exploring the intersection of technology and medicine, including informatics, AI, and healthcare databases.
  • 18.
    © 2020 TigerGraph.All Rights Reserved Demo 18 Download(Link In Chat) Data Folder
  • 19.
    © 2020 TigerGraph.All Rights Reserved Demo 19 Go To(Link In Chat) https://tgcloud.io
  • 20.
    © 2020 TigerGraph.All Rights Reserved Demo 20 Go To(Link In Chat) Named Entity Recognition
  • 21.
    © 2020 TigerGraph.All Rights Reserved Demo 21 Go To(Link In Chat) https://tgcloud.io
  • 22.
    © 2020 TigerGraph.All Rights Reserved 22 THANK YOU!
  • 23.
    © 2020 TigerGraph.All Rights Reserved 23 https://www.linkedin.com/in/jonherke/ Jonathan Herke TigerGraph Developer/Technical Evangelist Connect
  • 24.
    © 2020 TigerGraph.All Rights Reserved Resources 24 Developer Community • TG Community Forum community.tigergraph.com • TG Community Chat discord.gg/F2c9b9v • Reddit reddit.com/r/tigergraph/ • YouTube youtube.com/tigergraph • LinkedIn linkedin.com/company/tigergraph/ • Twitter twitter.com/tigergraphdb • Twitch twitch.tv/tigergraph • GitHub github.com/tigergraph/ecosys