Your SlideShare is downloading. ×
Eventbrite dataplatform and services - Interest graph based recommendations
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Eventbrite dataplatform and services - Interest graph based recommendations


Published on

Published in: Technology, Education

1 Comment
  • الفيس بوك
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Data Platform and Services Vipul Sharma and EyalReuveni
  • 2. Agenda Eventbrite Data Products Data Platform Recommendations Questions
  • 3. • A social event ticketing and discovery platform• 50th Million Ticket Sold• Revenue doubled YOY• 180 Employees in SOMA SF• Solving significant engineering problems • Data • Data, Infrastructure, Mobile, Web, Scale, Ops, QA• Firing all cylinders and hiring blazing
  • 4. Data Products
  • 5. Analytics • Add–Hoc queries by Analysts
  • 6. Fraud and Spam
  • 7. Data Platform
  • 8. Hadoop Cluster• 30 persistent EC2 High-Memory Instances• 30TB disk with replication factor of 2, ext3 formatted• CDH3• Fair Scheduler• HBase
  • 9. Infrastructure• Search • Solr • Incremental updates towards event driven• Recommendation/Graph • Hadoop • Native Java MapReduce • Bash for workflow• Persistence • MySql • HDFS • HBase • MongoDB (Investigating Cassandra and Riak)
  • 10. Infrastructure• Stream • RabbitMQ • Internal Fire hose (Investigating Kafka)• Offline • MapRedude • Streaming • Hive • Hue
  • 11. Infrastructure - Sqoozie• Workflow for mysql imports to HDFS • Generate Sqoop commands • Run these imports in parallel• Transparent to schema changes• Include or exclude on column, data types, table level• Data Type Casting tinyint(1)  Integer• Distributed Table Imports
  • 12. Infrastructure - Blammo• Raw logs are imported to HDFS via flume• Almost real-time – 5 min latency• Logs are key-value pairs in JSON• Each log producer publishes schema in yaml• Hive schema and schema yaml in sync using thrift• Control exclusion and inclusion
  • 13. Recommendations
  • 14. You will like to attend this event
  • 15. Recommendation Engines Interest Graph Based Social Graph Based (Your (Your friends who friends like Lady like rock music Collaborative Gaga so you will like you are Filtering – Item- like Lady attending Eric Item similarity Gaga, PYMK – Clapton Event– Facebook, Linkedin Eventbrite) Collaborative (You like Godfather so you ) Filtering – User- User Similarity will like Scarface - Netflix) (People who Item bought camera Hierarchy also bought batteries - (You bought Amazon) camera so you need batteries - Amazon)
  • 16. Why Interest? Events are Social Events are InterestDense Graph is Irrelevant Interest are Changing
  • 17. How do we know your Interest?• We ask you• Based on your activity • Events Attended • Events Browsed• Facebook Interests • User Interest has to match Event category • Static• Machine Learning • Logistic Regression using MLE • Sparse Matrix is generated using MapReduce • A model for each interest
  • 18. Model Based vs Clustering Item-Item vs User-User Building Social Graph is Clustering StepSocial Graph Recommendation is a Ranking Problem
  • 19. Implicit Social Graph U1 E1 E4 U2 U3 E2 E3 U4 U5
  • 20. Mixed Social Graph U1 E1 U2 U3 E2 E3 FB U4 U5 LI
  • 21. 15M * 260 * 260 = 1.14 Trillion Edges 4Billion edges ranked Each node is a feature vector representing a UserEach edge is a feature vector representing a Relationship
  • 22. Feature Generation• Mixed Features• A series of map-reduce jobs• Output on HDFS in flat files; Input to subsequent jobs• Orders = Event  Attendees • MAP: eid: uid • REDUCE: eid:[uid]• Attendees  Social Graph • Input: eid:[uid] • MAP: uidi:[uid] • REDUCE: uid:[neighbors]• Interest based features, user specific, graph mining etc• Upload feature values to HBase
  • 23. U1U2 U3
  • 24. HBase
  • 25. HBase• Collect data from multiple Map Reduce jobs • Stores entire social graph • Over one million writes per second
  • 26. HBase rowid neighbors events featureX 2718282 101 3 0.3678795
  • 27. HBaserowid 314159:n 314159:e 314159:fx 161803:n 161803:e 161803:fx2718282 31 1 0.3183 83 2 0.618
  • 28. Tips & Tricks• Distributed cache database • Sped up some Map Reduce jobs by hours • Be sure to use counters!
  • 29. Tips & Tricks• Hive (ab)uses • Almost as many hive jobs as custom ones • “flip join” • Statistical functions using hive • UDF
  • 30. Tips & Tricks• Memory Memory Memory• LZO, WAL• Combiners are great until• Shuffle and Sorting stage• Hadoop ecosystem is still new
  • 31. Questions?