Your SlideShare is downloading. ×
0
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013

536

Published on

Presentation from Vipul Sharma, Eventbrite …

Presentation from Vipul Sharma, Eventbrite
#dataconf
More at http://event.gigaom.com/structuredata/

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
536
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
16
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS SPEAKER: Vipul Sharma Director of Data Engineering EventbriteMonday, April 1, 13
  • 2. Real Time Data Processing at Scale Vipul Sharma – Director of Data EngineeringMonday, April 1, 13
  • 3. Eventbrite by the NumbersMonday, April 1, 13
  • 4. Eventbrite by the Numbers 1.5 million events 80 million tickets sold $1 billion in gross ticket sales Events in 179 countriesMonday, April 1, 13
  • 5. Who am I? Director of Data Engineering at Eventbrite Infrastructure, Data Science, Analytics, Spam and Fraud linkedin.com/in/vipulsharma3 @vipulsharma vipul@eventbrite.comMonday, April 1, 13
  • 6. Real Time • Definition of real time varies with use case • Real time at scale is a challenge • Active learning requires real time data processing • Spam/Fraud • Discovery • Search • Analytics • Real time analytics • Data Changes • Changes in inventory, user settings etcMonday, April 1, 13
  • 7. Scaling for Growth • Decouple Services • Decouple services based on CAP, Size and Growth • NoSQL attractive for out of the box sharding, replication and multi data center support along with high write speeds • Multiple data stores pose a challenges of data flow between services in real time • Batch Processing • Batch processing for big data e.g. data science, analytics etc • MapReduce is not built for real time • Data locality requires data to be stored on HDFS • Data Sync to Hadoop in real time is a challengeMonday, April 1, 13
  • 8. Monday, April 1, 13
  • 9. Challenges with Real Time • Data Flow • How to transfer data captured in logs to services in real time • How to transfer data captured in database to services in real time • Data Processing • How to process significant data in real time • Distributed data processing for real timeMonday, April 1, 13
  • 10. Data Flow • Database polling • Rather than each application polling build a single polling service • Downstream applications polls from this service • Built for consistency and read scalability • Example: Event Cache • Excited about Linkedin’s Databus - http://data.linkedin.com/projects/ databus • Persisted Queues • Transfer logs via a distributed persisted message queue • Downstream applications subscribe to these queues getting a stream of data • Example: Firehose • Excited about Linkedin’s Kafka - http://kafka.apache.org/index.htmlMonday, April 1, 13
  • 11. Data Processing • Denormalization • Write data ready to serve • NoSQL built for Denormalization • Example: See who’s visiting • Distributed Data Processing • Complex business logic needs more than de-normalization • Example: API stats using Storm • http://storm-project.net/Monday, April 1, 13
  • 12. Questions? See it in action. Download our app: eventbrite.com/eventbriteappMonday, April 1, 13
  • 13. Thank You! @vipulsharma/ vipul@eventbrite.comMonday, April 1, 13
  • 14. Monday, April 1, 13

×