0
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS                      SPEAKER: Vipul Sharma                              ...
Real Time Data Processing at Scale                           Vipul Sharma – Director of Data EngineeringMonday, April 1, 13
Eventbrite by the NumbersMonday, April 1, 13
Eventbrite by the Numbers                                 1.5 million events                              80 million ticke...
Who am I?          Director of Data Engineering at Eventbrite          Infrastructure, Data Science, Analytics, Spam and F...
Real Time          • Definition of real time varies with use case          • Real time at scale is a challenge          • A...
Scaling for Growth          • Decouple Services                 • Decouple services based on CAP, Size and Growth         ...
Monday, April 1, 13
Challenges with Real Time          • Data Flow                 • How to transfer data captured in logs to services in real...
Data Flow          • Database polling                 •    Rather than each application polling build a single polling ser...
Data Processing          • Denormalization            • Write data ready to serve            • NoSQL built for Denormaliza...
Questions?                      See it in action. Download our app:                      eventbrite.com/eventbriteappMonda...
Thank You!                      @vipulsharma/ vipul@eventbrite.comMonday, April 1, 13
Monday, April 1, 13
Upcoming SlideShare
Loading in...5
×

COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013

548

Published on

Presentation from Vipul Sharma, Eventbrite
#dataconf
More at http://event.gigaom.com/structuredata/

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
548
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
16
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013"

  1. 1. COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS SPEAKER: Vipul Sharma Director of Data Engineering EventbriteMonday, April 1, 13
  2. 2. Real Time Data Processing at Scale Vipul Sharma – Director of Data EngineeringMonday, April 1, 13
  3. 3. Eventbrite by the NumbersMonday, April 1, 13
  4. 4. Eventbrite by the Numbers 1.5 million events 80 million tickets sold $1 billion in gross ticket sales Events in 179 countriesMonday, April 1, 13
  5. 5. Who am I? Director of Data Engineering at Eventbrite Infrastructure, Data Science, Analytics, Spam and Fraud linkedin.com/in/vipulsharma3 @vipulsharma vipul@eventbrite.comMonday, April 1, 13
  6. 6. Real Time • Definition of real time varies with use case • Real time at scale is a challenge • Active learning requires real time data processing • Spam/Fraud • Discovery • Search • Analytics • Real time analytics • Data Changes • Changes in inventory, user settings etcMonday, April 1, 13
  7. 7. Scaling for Growth • Decouple Services • Decouple services based on CAP, Size and Growth • NoSQL attractive for out of the box sharding, replication and multi data center support along with high write speeds • Multiple data stores pose a challenges of data flow between services in real time • Batch Processing • Batch processing for big data e.g. data science, analytics etc • MapReduce is not built for real time • Data locality requires data to be stored on HDFS • Data Sync to Hadoop in real time is a challengeMonday, April 1, 13
  8. 8. Monday, April 1, 13
  9. 9. Challenges with Real Time • Data Flow • How to transfer data captured in logs to services in real time • How to transfer data captured in database to services in real time • Data Processing • How to process significant data in real time • Distributed data processing for real timeMonday, April 1, 13
  10. 10. Data Flow • Database polling • Rather than each application polling build a single polling service • Downstream applications polls from this service • Built for consistency and read scalability • Example: Event Cache • Excited about Linkedin’s Databus - http://data.linkedin.com/projects/ databus • Persisted Queues • Transfer logs via a distributed persisted message queue • Downstream applications subscribe to these queues getting a stream of data • Example: Firehose • Excited about Linkedin’s Kafka - http://kafka.apache.org/index.htmlMonday, April 1, 13
  11. 11. Data Processing • Denormalization • Write data ready to serve • NoSQL built for Denormalization • Example: See who’s visiting • Distributed Data Processing • Complex business logic needs more than de-normalization • Example: API stats using Storm • http://storm-project.net/Monday, April 1, 13
  12. 12. Questions? See it in action. Download our app: eventbrite.com/eventbriteappMonday, April 1, 13
  13. 13. Thank You! @vipulsharma/ vipul@eventbrite.comMonday, April 1, 13
  14. 14. Monday, April 1, 13
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×