SlideShare a Scribd company logo
1 of 23
Download to read offline
Introducing VenmoPlus.com
-Explore your Venmo network!
Qingpeng “Q.P.” Zhang, Insight Data Engineering Fellow
Historical
transactions
Real time
transactions
Pipeline
2013
Biggest Challenge:
● Calculate/Query graph distance in real time
● Cache of 2nd degree friends list
● Partitioned GraphDB
● Good for Linkedin (hundreds of million
users, with higher degree)
● 5 million vertices (users)
● 32 million distinct edges (transactions)
● 88 million total edges (transactions)
● Cache of 2nd degree friends list
● Partitioned GraphDB
● Good for Linkedin (hundreds of million
users, with higher degree)
● 5 million vertices (users)
● 32 million distinct edges (transactions)
● 88 million total edges (transactions)
No cache (precalculation)?
No GraphDB?
Historical
transactions
Real time
transactions
Two Databases
Two Databases
420890 Graham Hadley
1630476 Leon Tang
810029 Harminder Toor
1371353 Ephraim Park
562884 Paul Min
420890 set(14935158, 562884)
1630476 set(1371353)
810029 set(190230,14935158)
1371353 set(810029,971156)
562884 set(196371,1371353)
Two Databases
Optimizations
● Two databases
● Graph algorithms optimization
● S3⇔Redis S3⇔ Elasticsearch distributedly with Spark
● ...
VenmoPlus.com
m4.xlarge
m4.large
m4.xlarge
m4.large
t2.micro
$29.11/day
About Me
● Postdoc in Lawrence Berkeley National Lab
● PhD in Computer Science, Michigan State
● BS in Physics, Nanjing U.
Certified Volunteers:
● Software Carpentry
● Data Carpentry
● American Red Cross
Christmas Eve 2014, ice storm, Michigan
Algorithm Optimization
Shortest distance -> intersection of sets (friend lists)
● 1st degree friends of A ∩ 1st degree friends of B == [] ?
● 2nd degree friends of A ∩ 1st degree friends of B == []?
Algorithms Design -2
Query distance between vertices in a historic moment in a constantly changing graph (because we
don’t pre-calculate the distance….)
● A recent transaction for a user is history and has changed the graph
● Query distance of the two users at that moment.
○ not considering that specific transaction)
○ Remove the influence of that specific transaction temporarily and restore
■ Test if that transaction is the first between the pair of users.
1 Spark m4.large 0.12 2.88
2 Spark m4.large 0.12 2.88
3 redis m4.xlarge 0.24 5.76
4 Elasticsearc
h
m4.xlarge 0.24 5.76
5 Elasticsearc
h
m4.xlarge 0.24 5.76
6 Kafka,
producer
m4.large 0.12 2.88
7 kafka m4.large 0.12 2.88
8 webserver t2.micro 0.013 0.312
https://github.com/qingpeng/VenmoPlus for more details!
$29.11/24hours
Algorithms
Distance detection between vertices in graph (1st, 2nd, 3rd friends?)
● 1st degree friends of A ∩ 1st degree friends of B == [] ?
● 2nd degree friends of A ∩ 1st degree friends of B == []?
Pipeline
Redis:
● Graph Edges: userID -> userID
● Graph Vertices: userID -> userName
In memory DB -> Fast graph updating, graph traversal, in real time
ElasticSearch:
● Everything about the transactions
Distributed -> Data storage and full text search, in real time
Big Challenge:
● Graph distance + Common connections in real time
Pipeline
Historical
transactions
This, or that? - to build graph
This, or that? - for fast searching
Lesson learned
Qingpeng zhang week5
Qingpeng zhang week5

More Related Content

What's hot

MySQL Spatial Extensions And Ruby
MySQL Spatial Extensions And RubyMySQL Spatial Extensions And Ruby
MySQL Spatial Extensions And Ruby
mojodna
 

What's hot (6)

Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax AstraApache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
 
MongoDB - Warehouse and Aggregator of Events
MongoDB - Warehouse and Aggregator of EventsMongoDB - Warehouse and Aggregator of Events
MongoDB - Warehouse and Aggregator of Events
 
MySQL Spatial Extensions And Ruby
MySQL Spatial Extensions And RubyMySQL Spatial Extensions And Ruby
MySQL Spatial Extensions And Ruby
 
SqliteToRealm
SqliteToRealmSqliteToRealm
SqliteToRealm
 
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
 
Locality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkLocality Sensitive Hashing By Spark
Locality Sensitive Hashing By Spark
 

Viewers also liked

Receta para una vejez feliz
Receta para una vejez feliz Receta para una vejez feliz
Receta para una vejez feliz
Perrutii
 
Vike lavike
Vike lavikeVike lavike
Vike lavike
VIkanaft
 
Prezentacja SP3
Prezentacja SP3Prezentacja SP3
Prezentacja SP3
ddarek
 
Rivers and wetlands
Rivers and wetlandsRivers and wetlands
Rivers and wetlands
eyebell
 

Viewers also liked (20)

Q2 final
Q2 finalQ2 final
Q2 final
 
Presentation4
Presentation4Presentation4
Presentation4
 
Receta para una vejez feliz
Receta para una vejez feliz Receta para una vejez feliz
Receta para una vejez feliz
 
Hair factor pdf
Hair factor pdfHair factor pdf
Hair factor pdf
 
Pres1
Pres1Pres1
Pres1
 
Vike lavike
Vike lavikeVike lavike
Vike lavike
 
Normas apa
Normas apaNormas apa
Normas apa
 
London & paris
London & parisLondon & paris
London & paris
 
Prezentacja SP3
Prezentacja SP3Prezentacja SP3
Prezentacja SP3
 
Langkah Membuat Setting Permalink WordPress
Langkah Membuat Setting Permalink WordPressLangkah Membuat Setting Permalink WordPress
Langkah Membuat Setting Permalink WordPress
 
Stroke
StrokeStroke
Stroke
 
Itg investor presentation_06feb15
Itg investor presentation_06feb15Itg investor presentation_06feb15
Itg investor presentation_06feb15
 
It's futvre time pdf
It's futvre time pdfIt's futvre time pdf
It's futvre time pdf
 
Kien tap
Kien tapKien tap
Kien tap
 
Pg history and_programs_castellano
Pg history and_programs_castellanoPg history and_programs_castellano
Pg history and_programs_castellano
 
SWMS
SWMSSWMS
SWMS
 
Rivers and wetlands
Rivers and wetlandsRivers and wetlands
Rivers and wetlands
 
VenmoPlus demo week6
VenmoPlus demo week6VenmoPlus demo week6
VenmoPlus demo week6
 
LavaCon Portland 2013 - Cloudwords
LavaCon Portland 2013 - CloudwordsLavaCon Portland 2013 - Cloudwords
LavaCon Portland 2013 - Cloudwords
 
Evolution of Technology: 30 Years of Innovation to Reach the Cloud
Evolution of Technology: 30 Years of Innovation to Reach the CloudEvolution of Technology: 30 Years of Innovation to Reach the Cloud
Evolution of Technology: 30 Years of Innovation to Reach the Cloud
 

Similar to Qingpeng zhang week5

Similar to Qingpeng zhang week5 (20)

0629venmoplus
0629venmoplus0629venmoplus
0629venmoplus
 
Qingpeng zhang 0711
Qingpeng zhang 0711Qingpeng zhang 0711
Qingpeng zhang 0711
 
Qingpeng zhang 0713
Qingpeng zhang 0713Qingpeng zhang 0713
Qingpeng zhang 0713
 
Tutorial: The Role of Event-Time Analysis Order in Data Streaming
Tutorial: The Role of Event-Time Analysis Order in Data StreamingTutorial: The Role of Event-Time Analysis Order in Data Streaming
Tutorial: The Role of Event-Time Analysis Order in Data Streaming
 
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...
Approximate queries and graph streams on Flink, theodore vasiloudis,  seattle...Approximate queries and graph streams on Flink, theodore vasiloudis,  seattle...
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...
 
Using Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech IndustryUsing Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech Industry
 
Using Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech IndustryUsing Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech Industry
 
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
 
VenmoPlus0708
VenmoPlus0708VenmoPlus0708
VenmoPlus0708
 
VenmoPlus
VenmoPlusVenmoPlus
VenmoPlus
 
The Search for Gravitational Waves
The Search for Gravitational WavesThe Search for Gravitational Waves
The Search for Gravitational Waves
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
 
#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitter#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitter
 
Building Conclave: a decentralized, real-time collaborative text editor
Building Conclave: a decentralized, real-time collaborative text editorBuilding Conclave: a decentralized, real-time collaborative text editor
Building Conclave: a decentralized, real-time collaborative text editor
 
Graph Gurus Episode 5: Webinar PageRank
Graph Gurus Episode 5: Webinar PageRankGraph Gurus Episode 5: Webinar PageRank
Graph Gurus Episode 5: Webinar PageRank
 
Managing your black friday logs - Code Europe
Managing your black friday logs - Code EuropeManaging your black friday logs - Code Europe
Managing your black friday logs - Code Europe
 
Managing your Black Friday Logs NDC Oslo
Managing your  Black Friday Logs NDC OsloManaging your  Black Friday Logs NDC Oslo
Managing your Black Friday Logs NDC Oslo
 
Pydata talk
Pydata talkPydata talk
Pydata talk
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics Environment
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
 

Recently uploaded

如何办理(VIU毕业证书)温哥华岛大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(VIU毕业证书)温哥华岛大学毕业证成绩单本科硕士学位证留信学历认证如何办理(VIU毕业证书)温哥华岛大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(VIU毕业证书)温哥华岛大学毕业证成绩单本科硕士学位证留信学历认证
gkyvm
 
一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证
一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证
一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证
eqaqen
 
Transferable Skills - Roadmaps & Resources - Dirk Spencer
Transferable Skills - Roadmaps & Resources - Dirk SpencerTransferable Skills - Roadmaps & Resources - Dirk Spencer
Transferable Skills - Roadmaps & Resources - Dirk Spencer
Dirk Spencer Corporate Recruiter LION
 
如何办理(USC毕业证书)南加利福尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(USC毕业证书)南加利福尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(USC毕业证书)南加利福尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(USC毕业证书)南加利福尼亚大学毕业证成绩单本科硕士学位证留信学历认证
gakamzu
 
Biography of Doctor Arif Patel Preston UK
Biography of Doctor Arif Patel Preston UKBiography of Doctor Arif Patel Preston UK
Biography of Doctor Arif Patel Preston UK
ArifPatel42
 
如何办理(UIUC毕业证书)UIUC毕业证香槟分校毕业证成绩单本科硕士学位证留信学历认证
如何办理(UIUC毕业证书)UIUC毕业证香槟分校毕业证成绩单本科硕士学位证留信学历认证如何办理(UIUC毕业证书)UIUC毕业证香槟分校毕业证成绩单本科硕士学位证留信学历认证
如何办理(UIUC毕业证书)UIUC毕业证香槟分校毕业证成绩单本科硕士学位证留信学历认证
gakamzu
 
一比一定(购)南昆士兰大学毕业证(USQ毕业证)成绩单学位证
一比一定(购)南昆士兰大学毕业证(USQ毕业证)成绩单学位证一比一定(购)南昆士兰大学毕业证(USQ毕业证)成绩单学位证
一比一定(购)南昆士兰大学毕业证(USQ毕业证)成绩单学位证
eqaqen
 
Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)
Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)
Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)
Cara Menggugurkan Kandungan 087776558899
 

Recently uploaded (20)

UXPA Boston 2024 Maximize the Client Consultant Relationship.pdf
UXPA Boston 2024 Maximize the Client Consultant Relationship.pdfUXPA Boston 2024 Maximize the Client Consultant Relationship.pdf
UXPA Boston 2024 Maximize the Client Consultant Relationship.pdf
 
如何办理(VIU毕业证书)温哥华岛大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(VIU毕业证书)温哥华岛大学毕业证成绩单本科硕士学位证留信学历认证如何办理(VIU毕业证书)温哥华岛大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(VIU毕业证书)温哥华岛大学毕业证成绩单本科硕士学位证留信学历认证
 
We’re looking for a junior patent engineer to join our Team!
We’re looking for a junior patent engineer to join our Team!We’re looking for a junior patent engineer to join our Team!
We’re looking for a junior patent engineer to join our Team!
 
一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证
一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证
一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证
 
Transferable Skills - Roadmaps & Resources - Dirk Spencer
Transferable Skills - Roadmaps & Resources - Dirk SpencerTransferable Skills - Roadmaps & Resources - Dirk Spencer
Transferable Skills - Roadmaps & Resources - Dirk Spencer
 
Ascension Brown - Internship Resume 2024
Ascension Brown -  Internship Resume 2024Ascension Brown -  Internship Resume 2024
Ascension Brown - Internship Resume 2024
 
如何办理(USC毕业证书)南加利福尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(USC毕业证书)南加利福尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(USC毕业证书)南加利福尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(USC毕业证书)南加利福尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
Sales Experience Presentation - Angel Lopez
Sales Experience Presentation - Angel LopezSales Experience Presentation - Angel Lopez
Sales Experience Presentation - Angel Lopez
 
Biography of Doctor Arif Patel Preston UK
Biography of Doctor Arif Patel Preston UKBiography of Doctor Arif Patel Preston UK
Biography of Doctor Arif Patel Preston UK
 
Mallu Aunts ℂall Girls Ahmedabad ℂall Us 6378878445 Top ℂlass ℂall Girl Servi...
Mallu Aunts ℂall Girls Ahmedabad ℂall Us 6378878445 Top ℂlass ℂall Girl Servi...Mallu Aunts ℂall Girls Ahmedabad ℂall Us 6378878445 Top ℂlass ℂall Girl Servi...
Mallu Aunts ℂall Girls Ahmedabad ℂall Us 6378878445 Top ℂlass ℂall Girl Servi...
 
Only Cash On Delivery Call Girls Service In Nanded Enjoy 24/7 Escort Service
Only Cash On Delivery Call Girls Service In Nanded Enjoy 24/7 Escort ServiceOnly Cash On Delivery Call Girls Service In Nanded Enjoy 24/7 Escort Service
Only Cash On Delivery Call Girls Service In Nanded Enjoy 24/7 Escort Service
 
Ganga Path Project (marine drive project) Patna ,Bihar .pdf
Ganga Path Project (marine drive project) Patna ,Bihar .pdfGanga Path Project (marine drive project) Patna ,Bihar .pdf
Ganga Path Project (marine drive project) Patna ,Bihar .pdf
 
👉 Tirunelveli Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Gir...
👉 Tirunelveli Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Gir...👉 Tirunelveli Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Gir...
👉 Tirunelveli Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Gir...
 
如何办理(UIUC毕业证书)UIUC毕业证香槟分校毕业证成绩单本科硕士学位证留信学历认证
如何办理(UIUC毕业证书)UIUC毕业证香槟分校毕业证成绩单本科硕士学位证留信学历认证如何办理(UIUC毕业证书)UIUC毕业证香槟分校毕业证成绩单本科硕士学位证留信学历认证
如何办理(UIUC毕业证书)UIUC毕业证香槟分校毕业证成绩单本科硕士学位证留信学历认证
 
Chennai (Chennai) Independent Escorts - 9632533318 100% genuine
Chennai (Chennai) Independent Escorts - 9632533318 100% genuineChennai (Chennai) Independent Escorts - 9632533318 100% genuine
Chennai (Chennai) Independent Escorts - 9632533318 100% genuine
 
一比一定(购)南昆士兰大学毕业证(USQ毕业证)成绩单学位证
一比一定(购)南昆士兰大学毕业证(USQ毕业证)成绩单学位证一比一定(购)南昆士兰大学毕业证(USQ毕业证)成绩单学位证
一比一定(购)南昆士兰大学毕业证(USQ毕业证)成绩单学位证
 
❤️Mangalore Call Girls Service ❤️🍑 6378878445 👄🫦Independent Escort Service Ch...
❤️Mangalore Call Girls Service ❤️🍑 6378878445 👄🫦Independent Escort Service Ch...❤️Mangalore Call Girls Service ❤️🍑 6378878445 👄🫦Independent Escort Service Ch...
❤️Mangalore Call Girls Service ❤️🍑 6378878445 👄🫦Independent Escort Service Ch...
 
Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)
Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)
Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)
 
Launch Your Research Career: A Beginner's Guide
Launch Your Research Career: A Beginner's GuideLaunch Your Research Career: A Beginner's Guide
Launch Your Research Career: A Beginner's Guide
 
Crafting an effective CV for AYUSH Doctors.pdf
Crafting an effective CV for AYUSH Doctors.pdfCrafting an effective CV for AYUSH Doctors.pdf
Crafting an effective CV for AYUSH Doctors.pdf
 

Qingpeng zhang week5

  • 1. Introducing VenmoPlus.com -Explore your Venmo network! Qingpeng “Q.P.” Zhang, Insight Data Engineering Fellow
  • 3. 2013 Biggest Challenge: ● Calculate/Query graph distance in real time
  • 4. ● Cache of 2nd degree friends list ● Partitioned GraphDB ● Good for Linkedin (hundreds of million users, with higher degree) ● 5 million vertices (users) ● 32 million distinct edges (transactions) ● 88 million total edges (transactions)
  • 5. ● Cache of 2nd degree friends list ● Partitioned GraphDB ● Good for Linkedin (hundreds of million users, with higher degree) ● 5 million vertices (users) ● 32 million distinct edges (transactions) ● 88 million total edges (transactions) No cache (precalculation)? No GraphDB?
  • 7. Two Databases 420890 Graham Hadley 1630476 Leon Tang 810029 Harminder Toor 1371353 Ephraim Park 562884 Paul Min 420890 set(14935158, 562884) 1630476 set(1371353) 810029 set(190230,14935158) 1371353 set(810029,971156) 562884 set(196371,1371353)
  • 9. Optimizations ● Two databases ● Graph algorithms optimization ● S3⇔Redis S3⇔ Elasticsearch distributedly with Spark ● ...
  • 11. About Me ● Postdoc in Lawrence Berkeley National Lab ● PhD in Computer Science, Michigan State ● BS in Physics, Nanjing U. Certified Volunteers: ● Software Carpentry ● Data Carpentry ● American Red Cross Christmas Eve 2014, ice storm, Michigan
  • 12. Algorithm Optimization Shortest distance -> intersection of sets (friend lists) ● 1st degree friends of A ∩ 1st degree friends of B == [] ? ● 2nd degree friends of A ∩ 1st degree friends of B == []?
  • 13. Algorithms Design -2 Query distance between vertices in a historic moment in a constantly changing graph (because we don’t pre-calculate the distance….) ● A recent transaction for a user is history and has changed the graph ● Query distance of the two users at that moment. ○ not considering that specific transaction) ○ Remove the influence of that specific transaction temporarily and restore ■ Test if that transaction is the first between the pair of users.
  • 14. 1 Spark m4.large 0.12 2.88 2 Spark m4.large 0.12 2.88 3 redis m4.xlarge 0.24 5.76 4 Elasticsearc h m4.xlarge 0.24 5.76 5 Elasticsearc h m4.xlarge 0.24 5.76 6 Kafka, producer m4.large 0.12 2.88 7 kafka m4.large 0.12 2.88 8 webserver t2.micro 0.013 0.312 https://github.com/qingpeng/VenmoPlus for more details! $29.11/24hours
  • 15. Algorithms Distance detection between vertices in graph (1st, 2nd, 3rd friends?) ● 1st degree friends of A ∩ 1st degree friends of B == [] ? ● 2nd degree friends of A ∩ 1st degree friends of B == []?
  • 17. Redis: ● Graph Edges: userID -> userID ● Graph Vertices: userID -> userName In memory DB -> Fast graph updating, graph traversal, in real time ElasticSearch: ● Everything about the transactions Distributed -> Data storage and full text search, in real time Big Challenge: ● Graph distance + Common connections in real time
  • 19. This, or that? - to build graph
  • 20. This, or that? - for fast searching