Graph Day Texas: Open Source Graph Projects from PokitDok

A tour of the PokitDok Health Graph and
some open source graph projects
Graph Day Texas, Jan 2016
Denise Gosnell, PhD
Twitter and Github:
@pokitdok
@denisekgosnell

Confidential 2
PokitDok APIs:
The business of health,
for developers.
https://platform.pokitdok.com/
Twitter and Github:
@pokitdok
@denisekgosnell

Confidential 3
PokitDok APIs: Marketplace

Confidential 4
Doctor on Demand:
Powered by PokitDok
Twitter and Github:
@pokitdok
@denisekgosnell

6
What we built.
The HealthGraph
What we’ve open sourced.
A Gremlin-Python Library
Custom Titan Build
Dynamic JSON  Graph [WIP]
HealthGraph DSL [WIP]
Talk Outline:
Twitter and Github:
@pokitdok
@denisekgosnell

Confidential 8
X12 Data Standard:
ETL hell from the 1970s
Twitter and Github:
@pokitdok

Confidential 9
X12 Data Standard:
Twitter and Github:
@pokitdok

Confidential 10
Health Graph: Transaction as Trees
• We treat transactions as
first-class objects in the
graph
• Buried in the depth of an
X12 transactions are the
entities of interest
Twitter and Github:
@pokitdok
Interactive graph available at:
https://fullmetalhealth.com/dsl/

Confidential 11
HealthGraph:
Property Graph Model
Twitter and Github:
@pokitdok
@denisekgosnell

Confidential 12
HealthGraph: Probabilistic Inferences

Confidential 13
HealthGraph:
Data Inferences
Twitter and Github:
@pokitdok
@denisekgosnell

Confidential 14
HealthGraph: Predictive Models
• What is the probability claim X will be denied?
• A new customer just searched for “family practice”;
recommend the best provider within 10 miles.
• Given a CPT code, what is the expected
reimbursement rate from insurance company A in zip
code 37601?
Twitter and Github:
@pokitdok
@denisekgosnell

Confidential 15HealthGraph: Top 100k Providers
Twitter and Github:
@pokitdok
@MacraeAlec

PokitDok Open Source:
Gremlin Python

Confidential 17
Our HealthGraph
Production Stack
• Titan 0.5.3
• TinkerPop’s
Blueprints 2.50
• Cassandra
and Elastic Search
 Gremlin-Python
Twitter and Github:
@pokitdok
@denisekgosnell

Confidential 18
• Lighter Context Switching between
development tools and environments
• Incompatible syntax issues between
Gremlin and Python
• Using Python.
Gremlin-Python Motivation
Twitter and Github:
@corbinbs
@denisekgosnell

Confidential 19
Option 1: Grab our docker container
1. Install Docker
https://www.docker.com/docker-toolbox
2. Jump in the “Docker Quickstart Terminal”
3. Fire up our example container:
docker run -i -t pokitdok/gremlin-python-test-drive
Option 2: Shell script install
1. Clone our repo:
https://github.com/pokitdok/gremlin-python
2. Run the set-up scripts:
$./test_drive/setup.sh &&./test_drive/run.sh
Gremlin-Python Test Drive
Twitter and Github:
@corbinbs
@denisekgosnell

Confidential 20
Bi-Partite Graph
Recommendation System
Customer
viewed
scheduled_with
Doctor
Twitter and Github:
@pokitdok
@denisekgosnell

Confidential 21
Bi-Partite Graph
Customer
viewed
Doctor
Twitter and Github:
@pokitdok
@denisekgosnell

Confidential 22
Bi-Partite Graph
Customer
viewed
Doctor
Twitter and Github:
@pokitdok
@denisekgosnell

Confidential 23
Bi-Partite Graph
Customer
viewed
Doctor
Twitter and Github:
@pokitdok
@denisekgosnell

Confidential 24
Bi-Partite Graph
Customer
viewed
Doctor
Twitter and Github:
@pokitdok
@denisekgosnell
g.E.has(‘edge_type’,’scheduled_with’)
.in_v()
.group_count(ranked_docs,
lambda it: it.full_name,
lambda it: it.b+1.0)

Confidential 25
Gremlin-Python Test Drive
Twitter and Github:
@corbinbs
@denisekgosnell

PokitDok Open Source:
Custom Build of Titan 0.5.3 to
Integrate with CDH5 Containers

Confidential 27
Motivation for Release of Custom Build:
Graph Production Stack:
Titan 0.5.x ships with Hadoop 2.2
API Production Stack:
contains Cloudera’s CDH5 containers and Hadoop 2.6.0
You guessed it:
 infrastructure dependency errors upon integration 
the Hadoop 2.6.0 API is not fully backwards compatible
with Hadoop 2.2
Twitter and Github:
@pokitdok

Confidential 28
Released:
A modification of the Titan 0.5.3 build
to upgrade to Hadoop 2.6.0 and
resolve numerous conflicts among
transitive dependencies.
… someone had to do it.
Grab it here:
https://github.com/pokitdok/titan/tree/
0.5.3-hadoop2.6.0
 Tested for Cassandra but not
Hbase.
Twitter and Github:
@pokitdok

HealthGraph Dynamic JSON Load
Open Source Version [WIP]

Confidential 30
Dyanmic JSONLoader:
Goal: Bulk load of JSON from sequenced HDFS files
straight to a Titan DB
Twitter and Github:
@pokitdok

Confidential 31
1. Extract PokitDok HealthGraph specific features
2. Move to Titan 1.0 and TP3 compatibility
3. Release on PokitDok GitHub
Dyanmic JSONLoader Future Work
Twitter and Github:
@pokitdok

HealthGraph DSL
Open Source Version [WIP]

Confidential 33
X12 Data Standard:
Twitter and Github:
@pokitdok

Confidential 34
X12 Spec Trees vs. Graph DSL:
Twitter and Github:
@pokitdok
Interactive graph available at:
https://fullmetalhealth.com/dsl/

Confidential 35
Graph DSL with TinkerPop 2.5:
Twitter and Github:
@pokitdok

Confidential 36
1. Move to Titan 1.0 and TP3 compatibility
2. Release on PokitDok GitHub
3. Current Open Question:
 We are looking for(ward to) more documentation on
implementing custom gremlin steps(DSLs) in TP3
DSL Future Work
Twitter and Github:
@pokitdok

Confidential 38
Reach Out
Dev Blog: FullMetalHealth.com
@PokitDok @DeniseKGosnell
Denise.Gosnell@pokitdok.com

Graph Day Texas: Open Source Graph Projects from PokitDok

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Graph Day Texas: Open Source Graph Projects from PokitDok

Similar to Graph Day Texas: Open Source Graph Projects from PokitDok (20)

Recently uploaded

Recently uploaded (20)

Graph Day Texas: Open Source Graph Projects from PokitDok

Editor's Notes