Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introducing VenmoPlus.com 6/27 version

79 views

Published on

project for the Insight Data Engineering Program

venmoplus.com

https://github.com/qingpeng/VenmoPlus

Published in: Software
  • Be the first to comment

  • Be the first to like this

Introducing VenmoPlus.com 6/27 version

  1. 1. Introducing VenmoPlus.com Explore your Venmo network Qingpeng “Q.P.” Zhang Insight Data Engineering Fellow
  2. 2. Venmo ~= Facebook + Paypal
  3. 3. Demo VenmoPlus.com http://venmoplus.com:8999/#/
  4. 4. Pipeline Historical transactions
  5. 5. Pipeline
  6. 6. Historical transactions Real time transactions Pipeline
  7. 7. 2013 Biggest Challenge: ● Calculate/Query graph distance in real time
  8. 8. ● Cache of 2nd degree friends list ● Partitioned GraphDB ● Good for Linkedin (hundreds of million users, with higher degree) ● 5 million vertices (users) ● 32 million distinct edges (transactions) ● 88 million total edges (transactions)
  9. 9. ● Cache of 2nd degree friends list ● Partitioned GraphDB ● Good for Linkedin (hundreds of million users, with higher degree) ● 5 million vertices (users) ● 32 million distinct edges (transactions) ● 88 million total edges (transactions) No cache (precalculation)? No GraphDB?
  10. 10. Historical transactions Real time transactions Two Databases
  11. 11. Two Databases 420890 Graham Hadley 1630476 Leon Tang 810029 Harminder Toor 1371353 Ephraim Park 562884 Paul Min 420890 set(14935158, 562884) 1630476 set(1371353) 810029 set(190230,14935158) 1371353 set(810029,971156) 562884 set(196371,1371353)
  12. 12. Two Databases
  13. 13. This, or that? - to build graph
  14. 14. This, or that? - for fast searching
  15. 15. Historical transactions Real time transactions Two Databases
  16. 16. Lesson learned
  17. 17. VenmoPlus.com m4.xlarge m4.large m4.xlarge m4.large t2.micro $29.11/day
  18. 18. About Me ● PhD in Computer Science ● BS in Physics Volunteers: ● Software Carpentry ● Data Carpentry ● American Red Cross Christmas Eve 2014, ice storm, Michigan
  19. 19. Algorithm Optimization Shortest distance -> intersection of sets (friend lists) ● 1st degree friends of A ∩ 1st degree friends of B == [] ? ● 2nd degree friends of A ∩ 1st degree friends of B == []?
  20. 20. Algorithms Design -2 Query distance between vertices in a historic moment in a constantly changing graph (because we don’t pre-calculate the distance….) ● A recent transaction for a user is history and has changed the graph ● Query distance of the two users at that moment. ○ not considering that specific transaction) ○ Remove the influence of that specific transaction temporarily and restore ■ Test if that transaction is the first between the pair of users.
  21. 21. 1 Spark m4.large 0.12 2.88 2 Spark m4.large 0.12 2.88 3 redis m4.xlarge 0.24 5.76 4 Elasticsearc h m4.xlarge 0.24 5.76 5 Elasticsearc h m4.xlarge 0.24 5.76 6 Kafka, producer m4.large 0.12 2.88 7 kafka m4.large 0.12 2.88 8 webserver t2.micro 0.013 0.312 https://github.com/qingpeng/VenmoPlus for more details! $29.11/24hours
  22. 22. Algorithms Distance detection between vertices in graph (1st, 2nd, 3rd friends?) ● 1st degree friends of A ∩ 1st degree friends of B == [] ? ● 2nd degree friends of A ∩ 1st degree friends of B == []?
  23. 23. Producer [10] [7,8] [1-6] [1-6] [4,5,6] [1] Backend/API [9] Frontend [9] [2,3]
  24. 24. Redis: ● Graph Edges: userID -> userID ● Graph Vertices: userID -> userName In memory DB -> Fast graph updating, graph traversal, in real time ElasticSearch: ● Everything about the transactions Distributed -> Data storage and full text search, in real time Big Challenge: ● Graph distance + Common connections in real time

×