0629venmoplus

Introducing VenmoPlus.com
-Explore your Venmo network!
Qingpeng “Q.P.” Zhang, Insight Data Engineering Fellow

Historical
transactions
Real time
transactions
Pipeline

2013
Biggest Challenge:
● Calculate/Query graph distance in real time

Solutions
● Two databases
● Graph algorithm optimizations
● Query/search optimizations
● S3⇔Redis S3⇔ Elasticsearch distributedly with Spark
● ...

Historical
transactions
Real time
transactions
A Tale of Two Databases

Two Databases
420890 Graham Hadley
1630476 Leon Tang
810029 Harminder Toor
1371353 Ephraim Park
562884 Paul Min
420890 set(14935158, 562884)
1630476 set(1371353)
810029 set(190230,14935158)
1371353 set(810029,971156)
562884 set(196371,1371353)

VenmoPlus.com
m4.xlarge
m4.large
m4.xlarge
m4.large
t2.micro
$29.11/day

About Me
● Postdoc in Lawrence Berkeley National Lab
● PhD in Computer Science, Michigan State
Certified Volunteers:
● Software Carpentry
● Data Carpentry
● American Red Cross
Christmas Eve 2014, ice storm, Michigan

Algorithm 1
Shortest distance -> intersection of sets (friend lists)
● 1st degree friends of A ∩ 1st degree friends of B == [] ?
● 2nd degree friends of A ∩ 1st degree friends of B == []?

Algorithm 2
Query distance between vertices in a historic moment in a constantly changing graph (because we
don’t pre-calculate the distance….)
● A recent transaction for a user is history and has changed the graph
● Query distance of the two users at that moment.
○ not considering that specific transaction)
○ Remove the influence of that specific transaction temporarily and restore
■ Test if that transaction is the first between the pair of users.

Pipeline, raw data, in distributed way

This, or that? - to build graph

This, or that? - for fast searching

Query/Search Optimizations
1. Remove aggregation for better performance… (trade-off)
2. Friend recommender:
a. Using Counter to get only 5 users with the most common friends
3. Search message in friend circle
a. Combine query of Elasticsearch and Redis

● Cache of 2nd degree friends list
● Partitioned GraphDB
● Good for Linkedin (hundreds of million
users, with higher degree)
● 5 million vertices (users)
● 32 million distinct edges (transactions)
● 88 million total edges (transactions)

● Cache of 2nd degree friends list
● Partitioned GraphDB
● Good for Linkedin (hundreds of million
users, with higher degree)
● 5 million vertices (users)
● 32 million distinct edges (transactions)
● 88 million total edges (transactions)
No cache (precalculation)?
No GraphDB?

0629venmoplus

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (20)

Similar to 0629venmoplus

Similar to 0629venmoplus (20)

Recently uploaded

Recently uploaded (20)

0629venmoplus