The PokitDok data science team uses many components in the TinkerPop stack, along with the Titan graph database. Let it be known, though, that we’re a serious Python shop. As a team, we wanted to do data analytics and not have to context switch between all the languages that are required to stand up this graph database. There was a desire to continue to use Python syntax when defining graph schema using the management system, performing graph traversals, building recommendation systems and so on, but the TinkerPop and Titan stacks run on the JVM.
Our solution: connect the development environments with Jython to build out our own Python library for graph traversals. We’ve open sourced the work we've been doing to help engineers and data scientists use Python to work within TinkerPop and Titan from a Python state of mind.
In this talk, PokitDok’s Engineer #1 teams up with a Data Scientist to discuss the intricacies of our development environment, introduce our open sourced Gremlin-Python library, and explore a graph based recommendation system. We will step through the underpinnings of Gremlin-Python to create a system that ranks and recommends healthcare professionals.
Defining Constituents, Data Vizzes and Telling a Data Story
TinkerPop and Titan from a Python State of Mind
1. TinkerPop and Titan from a
Python State of Mind
NYC PyData 2015
Brian Corbin and
Denise Gosnell, PhD
Twitter & github:
@corbinbs
@denisekgosnell
gremlin-python
2. 2
What is PokitDok?
PokitDok is the
operating system
for
digital health
Twitter & github:
@corbinbs
@denisekgosnell
5. 5
PokitDok APIs
The business of health,
for developers.
https://platform.pokitdok.com/
Twitter & github:
@corbinbs
@denisekgosnell
6. 6
What we built:
APIs
Our marketplace
The HealthGraph
A Gremlin-Python Library
Why?
Test Drive.
TinkerPop and Titan from a
Python State of Mind
Twitter & github:
@corbinbs
@denisekgosnell
10. 10
HealthGraph: Predictive Models
• What is the probability claim X will be denied?
• A new customer just searched for “family
practice”; recommend the best provider within
10 miles.
• Given a CPT code, what is the expected
reimbursement rate from insurance company
A in zip code 37601?
Twitter & github:
@corbinbs
@denisekgosnell
12. 12
• Lighter Context Switching between
development tools and environments
• Incompatible syntax issues between Gremlin
and Python
• Using Python.
Gremlin-Python
Motivation
Twitter & github:
@corbinbs
@denisekgosnell
18. 18
Option 1: Grab our docker container
1. Install Docker
https://www.docker.com/docker-toolbox
2. Jump in the “Docker Quickstart Terminal”
3. Fire up our example container:
docker run -i -t pokitdok/gremlin-python-test-drive
Option 2: Shell script install
1. Clone our repo:
https://github.com/pokitdok/gremlin-python
2. Run the set-up scripts:
$./test_drive/setup.sh &&./test_drive/run.sh
Gremlin-Python Test Drive
Twitter & github:
@corbinbs
@denisekgosnell
19. 19
Shout-outs:
Jython project
Rexpro Python
Advantages:
Initial exploratory value via quickly standing
up graph traversals via python syntax
Better team integration across API
development/data science
Gremlin-Python
Conclusion
Twitter & github:
@corbinbs
@denisekgosnell
20. 20
1. Transition to Titan 1.0 and
Tinkerpop 3.0
2. Keep the communication open between the data
team and the API team to continue building out this
integration
3. Deploy Python implementations of fundamental
graph algorithms:
BFS, DFS, Dijkstra, … etc
Gremthon Future Work
Twitter & github:
@corbinbs
@denisekgosnell
21. TinkerPop and Titan from a
Python State of Mind
Brian Corbin and
Denise Gosnell, PhD
Twitter & github:
@corbinbs
@denisekgosnell
gremlin-python
Editor's Notes
Personal story of how I got into graph analytics at this university
obligatory who are we slide
Relevant Timing: Xerox is powered by Pokitdok
For something the crowd can go see ---
we made all of our stuff available via API.
transitional purposes only
what kind of data do we have
4.3 million providers
we can also answer all sorts of questions
Current healthcare infrastructure is fractured and antiquated… they can’t answer these questions.
This is a slide about why
data management:
data engineering: loading of data into a database
data science: probabilistic inferences
data management:
data engineering: loading of data into a database
data science: probabilistic inferences
data management:
data engineering: loading of data into a database
data science: probabilistic inferences
data management:
data engineering: loading of data into a database
data science: probabilistic inferences
Personal story of how I got into graph analytics at this university