SlideShare a Scribd company logo
1 of 34
Data Platform and Services

  Vipul Sharma and EyalReuveni
Agenda


            Eventbrite
           Data Products
           Data Platform
         Recommendations
            Questions
•   A social event ticketing and discovery platform
•   50th Million Ticket Sold
•   Revenue doubled YOY
•   180 Employees in SOMA SF
•   Solving significant engineering problems
    • Data
    • Data, Infrastructure, Mobile, Web, Scale, Ops, QA
• Firing all cylinders and hiring blazing fast
www.eventbrite.com/jobs
Data Products
Analytics




            • Add–Hoc queries by Analysts
Fraud and Spam
Data Platform
Hadoop Cluster




•   30 persistent EC2 High-Memory Instances
•   30TB disk with replication factor of 2, ext3 formatted
•   CDH3
•   Fair Scheduler
•   HBase
Infrastructure

• Search
   • Solr
   • Incremental updates towards event driven
• Recommendation/Graph
   • Hadoop
   • Native Java MapReduce
   • Bash for workflow
• Persistence
   •   MySql
   •   HDFS
   •   HBase
   •   MongoDB (Investigating Cassandra and Riak)
Infrastructure


• Stream
   • RabbitMQ
   • Internal Fire hose (Investigating Kafka)
• Offline
   •   MapRedude
   •   Streaming
   •   Hive
   •   Hue
Infrastructure - Sqoozie



• Workflow for mysql imports to HDFS
    • Generate Sqoop commands
    • Run these imports in parallel
•   Transparent to schema changes
•   Include or exclude on column, data types, table level
•   Data Type Casting tinyint(1)  Integer
•   Distributed Table Imports
Infrastructure - Blammo



•   Raw logs are imported to HDFS via flume
•   Almost real-time – 5 min latency
•   Logs are key-value pairs in JSON
•   Each log producer publishes schema in yaml
•   Hive schema and schema yaml in sync using thrift
•   Control exclusion and inclusion
Recommendations
You will like to attend this event
Recommendation Engines



                                                                                     Interest Graph
                                                                                     Based
                                                                 Social Graph
                                                                 Based (Your         (Your friends who
                                                                 friends like Lady   like rock music
                                          Collaborative          Gaga so you will    like you are
                                          Filtering – Item-      like Lady Gaga,     attending Eric
                                          Item similarity        PYMK – Facebook,    Clapton Event–
                                                                 Linkedin)           Eventbrite)
                      Collaborative       (You like
                      Filtering – User-   Godfather so you
                      User Similarity     will like Scarface -
                                          Netflix)
                      (People who
     Item             bought camera
     Hierarchy        also bought
                      batteries -
     (You bought      Amazon)
     camera so you
     need batteries
     - Amazon)
Why Interest?




  Events are Social          Events are Interest




Dense Graph is Irrelevant
                            Interest are Changing
How do we know your Interest?


• We ask you
• Based on your activity
   • Events Attended
   • Events Browsed
• Facebook Interests
   • User Interest has to match Event category
   • Static
• Machine Learning
   • Logistic Regression using MLE
   • Sparse Matrix is generated using MapReduce
   • A model for each interest
Model Based vs Clustering

            Item-Item vs User-User

     Building Social Graph is Clustering Step

Social Graph Recommendation is a Ranking Problem
Implicit Social Graph


                                 U1


                            E1        E4

                  U2                       U3


             E2        E3

        U4                       U5
Mixed Social Graph


                                U1


                           E1

                 U2                  U3


            E2        E3
                                          FB
       U4                       U5
                                          LI
15M * 260 * 260 = 1.14 Trillion Edges
               4Billion edges ranked
   Each node is a feature vector representing a User

Each edge is a feature vector representing a Relationship
Feature Generation

•   Mixed Features
•   A series of map-reduce jobs
•   Output on HDFS in flat files; Input to subsequent jobs
•   Orders = Event  Attendees
    • MAP: eid: uid
    • REDUCE: eid:[uid]
• Attendees  Social Graph
    • Input: eid:[uid]
    • MAP: uidi:[uid]
    • REDUCE: uid:[neighbors]
• Interest based features, user specific, graph mining etc
• Upload feature values to HBase
U1




U2        U3
HBase
HBase




• Collect data from multiple Map Reduce jobs
   • Stores entire social graph
   • Over one million writes per second
HBase




    rowid     neighbors   events   featureX
    2718282   101         3        0.3678795
HBase




rowid     314159:n   314159:e   314159:fx   161803:n   161803:e   161803:fx
2718282   31         1          0.3183      83         2          0.618
Tips & Tricks




• Distributed cache database
   • Sped up some Map Reduce jobs by hours
   • Be sure to use counters!
Tips & Tricks




• Hive (ab)uses
   •   Almost as many hive jobs as custom ones
   •   “flip join”
   •   Statistical functions using hive
   •   UDF
Tips & Tricks


•   Memory Memory Memory
•   LZO, WAL
•   Combiners are great until
•   Shuffle and Sorting stage
•   Hadoop ecosystem is still new
Questions?

More Related Content

Viewers also liked

一行禅师欢喜推荐《从挫折中觉醒》
一行禅师欢喜推荐《从挫折中觉醒》一行禅师欢喜推荐《从挫折中觉醒》
一行禅师欢喜推荐《从挫折中觉醒》brightcultruebooks
 
Simon reescv 10_fitness
Simon reescv 10_fitnessSimon reescv 10_fitness
Simon reescv 10_fitnessSimundo Rees
 
מיסוי ירוק - שינויים מבניים במערכת המס
מיסוי ירוק - שינויים מבניים במערכת המסמיסוי ירוק - שינויים מבניים במערכת המס
מיסוי ירוק - שינויים מבניים במערכת המסshimshi
 
挫折觉醒简报抢档抢先看5[1].20
挫折觉醒简报抢档抢先看5[1].20挫折觉醒简报抢档抢先看5[1].20
挫折觉醒简报抢档抢先看5[1].20brightcultruebooks
 
Isoinmunización materno-fetal
Isoinmunización materno-fetalIsoinmunización materno-fetal
Isoinmunización materno-fetalsharkwolf93
 
挫折觉醒简报抢档抢先看5[1].20
挫折觉醒简报抢档抢先看5[1].20挫折觉醒简报抢档抢先看5[1].20
挫折觉醒简报抢档抢先看5[1].20brightcultruebooks
 
《从挫折中觉醒》简报档抢先看
《从挫折中觉醒》简报档抢先看《从挫折中觉醒》简报档抢先看
《从挫折中觉醒》简报档抢先看brightcultruebooks
 
Ablessingindisguise 一行禅师欢喜推荐《从挫折中觉醒》
Ablessingindisguise 一行禅师欢喜推荐《从挫折中觉醒》Ablessingindisguise 一行禅师欢喜推荐《从挫折中觉醒》
Ablessingindisguise 一行禅师欢喜推荐《从挫折中觉醒》brightcultruebooks
 
心靈瑜伽新書簡報Heartyoga
心靈瑜伽新書簡報Heartyoga心靈瑜伽新書簡報Heartyoga
心靈瑜伽新書簡報Heartyogabrightcultruebooks
 

Viewers also liked (13)

一行禅师欢喜推荐《从挫折中觉醒》
一行禅师欢喜推荐《从挫折中觉醒》一行禅师欢喜推荐《从挫折中觉醒》
一行禅师欢喜推荐《从挫折中觉醒》
 
Testtting
TestttingTesttting
Testtting
 
Simon reescv 10_fitness
Simon reescv 10_fitnessSimon reescv 10_fitness
Simon reescv 10_fitness
 
מיסוי ירוק - שינויים מבניים במערכת המס
מיסוי ירוק - שינויים מבניים במערכת המסמיסוי ירוק - שינויים מבניים במערכת המס
מיסוי ירוק - שינויים מבניים במערכת המס
 
A blessing in disguise
A blessing  in disguiseA blessing  in disguise
A blessing in disguise
 
Testtting
TestttingTesttting
Testtting
 
挫折觉醒简报抢档抢先看5[1].20
挫折觉醒简报抢档抢先看5[1].20挫折觉醒简报抢档抢先看5[1].20
挫折觉醒简报抢档抢先看5[1].20
 
Isoinmunización materno-fetal
Isoinmunización materno-fetalIsoinmunización materno-fetal
Isoinmunización materno-fetal
 
挫折觉醒简报抢档抢先看5[1].20
挫折觉醒简报抢档抢先看5[1].20挫折觉醒简报抢档抢先看5[1].20
挫折觉醒简报抢档抢先看5[1].20
 
《从挫折中觉醒》简报档抢先看
《从挫折中觉醒》简报档抢先看《从挫折中觉醒》简报档抢先看
《从挫折中觉醒》简报档抢先看
 
Fall winter
Fall winterFall winter
Fall winter
 
Ablessingindisguise 一行禅师欢喜推荐《从挫折中觉醒》
Ablessingindisguise 一行禅师欢喜推荐《从挫折中觉醒》Ablessingindisguise 一行禅师欢喜推荐《从挫折中觉醒》
Ablessingindisguise 一行禅师欢喜推荐《从挫折中觉醒》
 
心靈瑜伽新書簡報Heartyoga
心靈瑜伽新書簡報Heartyoga心靈瑜伽新書簡報Heartyoga
心靈瑜伽新書簡報Heartyoga
 

Similar to Ashu Desc

CSC 8101 Non Relational Databases
CSC 8101 Non Relational DatabasesCSC 8101 Non Relational Databases
CSC 8101 Non Relational Databasessjwoodman
 
InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast
InfiniteGraph Presentation from Oct 21, 2010 DBTA WebcastInfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast
InfiniteGraph Presentation from Oct 21, 2010 DBTA WebcastInfiniteGraph
 
Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014Claudio Martella
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Srinath Perera
 
Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014Miguel Pastor
 
Exploring Data Preparation and Visualization Tools for Urban Forestry
Exploring Data Preparation and Visualization Tools for Urban ForestryExploring Data Preparation and Visualization Tools for Urban Forestry
Exploring Data Preparation and Visualization Tools for Urban ForestryAzavea
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Sparktsliwowicz
 
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...Mitul Tiwari
 
Technical Challenges of Developing a Facebook Game
Technical Challenges of Developing a Facebook GameTechnical Challenges of Developing a Facebook Game
Technical Challenges of Developing a Facebook GamePatrick Huesler
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databasesthai
 
Tech4Africa - Opportunities around Big Data
Tech4Africa - Opportunities around Big DataTech4Africa - Opportunities around Big Data
Tech4Africa - Opportunities around Big DataSteve Watt
 
Presto Summit 2018 - 02 - LinkedIn
Presto Summit 2018  - 02 - LinkedInPresto Summit 2018  - 02 - LinkedIn
Presto Summit 2018 - 02 - LinkedInkbajda
 
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Benjamin Nussbaum
 
Doug McCune - Using Open Source Flex and ActionScript Projects
Doug McCune - Using Open Source Flex and ActionScript ProjectsDoug McCune - Using Open Source Flex and ActionScript Projects
Doug McCune - Using Open Source Flex and ActionScript ProjectsDoug McCune
 
Scratchpads past,present,future
Scratchpads past,present,futureScratchpads past,present,future
Scratchpads past,present,futureEdward Baker
 
Building Recommendation Platforms with Hadoop
Building Recommendation Platforms with HadoopBuilding Recommendation Platforms with Hadoop
Building Recommendation Platforms with HadoopJayant Shekhar
 

Similar to Ashu Desc (20)

Eventbrite sxsw
Eventbrite sxswEventbrite sxsw
Eventbrite sxsw
 
CSC 8101 Non Relational Databases
CSC 8101 Non Relational DatabasesCSC 8101 Non Relational Databases
CSC 8101 Non Relational Databases
 
InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast
InfiniteGraph Presentation from Oct 21, 2010 DBTA WebcastInfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast
InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast
 
Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack
 
Music streams
Music streamsMusic streams
Music streams
 
Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014
 
Exploring Data Preparation and Visualization Tools for Urban Forestry
Exploring Data Preparation and Visualization Tools for Urban ForestryExploring Data Preparation and Visualization Tools for Urban Forestry
Exploring Data Preparation and Visualization Tools for Urban Forestry
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
 
Steve Watt Presentation
Steve Watt PresentationSteve Watt Presentation
Steve Watt Presentation
 
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
 
Technical Challenges of Developing a Facebook Game
Technical Challenges of Developing a Facebook GameTechnical Challenges of Developing a Facebook Game
Technical Challenges of Developing a Facebook Game
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
 
Tech4Africa - Opportunities around Big Data
Tech4Africa - Opportunities around Big DataTech4Africa - Opportunities around Big Data
Tech4Africa - Opportunities around Big Data
 
UNit4.pdf
UNit4.pdfUNit4.pdf
UNit4.pdf
 
Presto Summit 2018 - 02 - LinkedIn
Presto Summit 2018  - 02 - LinkedInPresto Summit 2018  - 02 - LinkedIn
Presto Summit 2018 - 02 - LinkedIn
 
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
 
Doug McCune - Using Open Source Flex and ActionScript Projects
Doug McCune - Using Open Source Flex and ActionScript ProjectsDoug McCune - Using Open Source Flex and ActionScript Projects
Doug McCune - Using Open Source Flex and ActionScript Projects
 
Scratchpads past,present,future
Scratchpads past,present,futureScratchpads past,present,future
Scratchpads past,present,future
 
Building Recommendation Platforms with Hadoop
Building Recommendation Platforms with HadoopBuilding Recommendation Platforms with Hadoop
Building Recommendation Platforms with Hadoop
 

Ashu Desc

  • 1. Data Platform and Services Vipul Sharma and EyalReuveni
  • 2. Agenda Eventbrite Data Products Data Platform Recommendations Questions
  • 3. A social event ticketing and discovery platform • 50th Million Ticket Sold • Revenue doubled YOY • 180 Employees in SOMA SF • Solving significant engineering problems • Data • Data, Infrastructure, Mobile, Web, Scale, Ops, QA • Firing all cylinders and hiring blazing fast www.eventbrite.com/jobs
  • 5.
  • 6.
  • 7. Analytics • Add–Hoc queries by Analysts
  • 10.
  • 11. Hadoop Cluster • 30 persistent EC2 High-Memory Instances • 30TB disk with replication factor of 2, ext3 formatted • CDH3 • Fair Scheduler • HBase
  • 12. Infrastructure • Search • Solr • Incremental updates towards event driven • Recommendation/Graph • Hadoop • Native Java MapReduce • Bash for workflow • Persistence • MySql • HDFS • HBase • MongoDB (Investigating Cassandra and Riak)
  • 13. Infrastructure • Stream • RabbitMQ • Internal Fire hose (Investigating Kafka) • Offline • MapRedude • Streaming • Hive • Hue
  • 14. Infrastructure - Sqoozie • Workflow for mysql imports to HDFS • Generate Sqoop commands • Run these imports in parallel • Transparent to schema changes • Include or exclude on column, data types, table level • Data Type Casting tinyint(1)  Integer • Distributed Table Imports
  • 15. Infrastructure - Blammo • Raw logs are imported to HDFS via flume • Almost real-time – 5 min latency • Logs are key-value pairs in JSON • Each log producer publishes schema in yaml • Hive schema and schema yaml in sync using thrift • Control exclusion and inclusion
  • 17. You will like to attend this event
  • 18. Recommendation Engines Interest Graph Based Social Graph Based (Your (Your friends who friends like Lady like rock music Collaborative Gaga so you will like you are Filtering – Item- like Lady Gaga, attending Eric Item similarity PYMK – Facebook, Clapton Event– Linkedin) Eventbrite) Collaborative (You like Filtering – User- Godfather so you User Similarity will like Scarface - Netflix) (People who Item bought camera Hierarchy also bought batteries - (You bought Amazon) camera so you need batteries - Amazon)
  • 19. Why Interest? Events are Social Events are Interest Dense Graph is Irrelevant Interest are Changing
  • 20. How do we know your Interest? • We ask you • Based on your activity • Events Attended • Events Browsed • Facebook Interests • User Interest has to match Event category • Static • Machine Learning • Logistic Regression using MLE • Sparse Matrix is generated using MapReduce • A model for each interest
  • 21. Model Based vs Clustering Item-Item vs User-User Building Social Graph is Clustering Step Social Graph Recommendation is a Ranking Problem
  • 22. Implicit Social Graph U1 E1 E4 U2 U3 E2 E3 U4 U5
  • 23. Mixed Social Graph U1 E1 U2 U3 E2 E3 FB U4 U5 LI
  • 24. 15M * 260 * 260 = 1.14 Trillion Edges 4Billion edges ranked Each node is a feature vector representing a User Each edge is a feature vector representing a Relationship
  • 25. Feature Generation • Mixed Features • A series of map-reduce jobs • Output on HDFS in flat files; Input to subsequent jobs • Orders = Event  Attendees • MAP: eid: uid • REDUCE: eid:[uid] • Attendees  Social Graph • Input: eid:[uid] • MAP: uidi:[uid] • REDUCE: uid:[neighbors] • Interest based features, user specific, graph mining etc • Upload feature values to HBase
  • 26. U1 U2 U3
  • 27. HBase
  • 28. HBase • Collect data from multiple Map Reduce jobs • Stores entire social graph • Over one million writes per second
  • 29. HBase rowid neighbors events featureX 2718282 101 3 0.3678795
  • 30. HBase rowid 314159:n 314159:e 314159:fx 161803:n 161803:e 161803:fx 2718282 31 1 0.3183 83 2 0.618
  • 31. Tips & Tricks • Distributed cache database • Sped up some Map Reduce jobs by hours • Be sure to use counters!
  • 32. Tips & Tricks • Hive (ab)uses • Almost as many hive jobs as custom ones • “flip join” • Statistical functions using hive • UDF
  • 33. Tips & Tricks • Memory Memory Memory • LZO, WAL • Combiners are great until • Shuffle and Sorting stage • Hadoop ecosystem is still new