Your SlideShare is downloading. ×
0
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Flurry Analytic Backend - Processing Terabytes of Data in Real-time

2,199

Published on

0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,199
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
38
Comments
0
Likes
8
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. www.flurry.com November 14, 2013 Anthony Watkins, Senior Director of Developer Relations Processing Terabytes of Data in Real- Time @flurrymobile @antwatkins
  • 2. www.flurry.com
  • 3. Flurry is a leading mobile advertising and analytics provider Publisher Advertiser Audience AppCircle Applications: 10,000+ Devices/month: 300M Conversions/month: 120M AppSpot Applications: 2,500+ Devices/month: 250M Impressions/month: 7.5B Analytics Applications: 400,000 Devices/month: 1.2B Data points/month: 1.9T
  • 4. • Why Flurry Switched from a MapReduce Framework to pipeline processing • How Flurry uses Kafka in data processing • Tuning of Kafka to work in Flurry’s environment • Flurry Monitoring and error handling of streams Topics The Path to Real-Time Processing www.flurry.com 4
  • 5. The Why www.flurry.com 5
  • 6. Past Processing Model www.flurry.com 6 Device Reports NoSQL DataStore Batch Collectors MapReduce (jobs) External Action
  • 7. Flurry Analytics MapReduce Architecture www.flurry.com 7 Agent Portal Data Log Processor Developer Portal Metrics Computer HDFS HBase HBase Hadoop/Hbase Jetty Jetty HTTP Binary Encoded Data Raw Data Log Archive Metrics Table (Cube) Normalized Data Storage User Profile Data MySQL Hadoop Map/Reduce Hadoop Map/Reduce Web Layer Metrics Processing
  • 8. Data Collection and Processing in MR Pros www.flurry.com 8 MapReduce (jobs)
  • 9. Data Collection and Processing in MR Cons www.flurry.com 9 Device Reports MapReduce (jobs) Job Time Startup Time
  • 10. Flurry Kafka The Move to Kafka www.flurry.com 10
  • 11. About Kafka Origin www.flurry.com 11 November 2010 June 2011 November 2012
  • 12. About Kafka www.flurry.com 12 Producer ProducerProducer Kakfa Broker Consumer Consumer Consumer
  • 13. About Kafka www.flurry.com 13 Kafka Broker * * Partition image courtesy of http://kafka.apache.org/images/log_anatomy.png
  • 14. About Kafka www.flurry.com 14 Producer 1 Producer NProducer 2 Kafka Cluster Broker 1 P0 P2 Broker 2 P1 P3 Consumer Group C1 C2 C3
  • 15. Why Kafka for Flurry www.flurry.com 15 Device Reports MapReduce (jobs) Kafka Startup Time
  • 16. Introducing the Data Log Consumer (DLC) www.flurry.com 16 Agent Portal Data Log Consumer Developer Portal Metrics Computer HDFS HBase HBase Hadoop/Hbase Jetty Jetty HTTP Binary Encoded Data Metrics Table (Cube) Normalized Data Storage User Profile Data MySQL Kafka Hadoop Map/Reduce Web Layer Metrics Processing
  • 17. • Zookeeper timeouts • Completely async service • Default fsync interval • Commit threshold from local environments Tuning Kafka for Flurry Challenges www.flurry.com 17
  • 18. How Flurry Uses Kafka Infrastructure and Setup www.flurry.com 18 Consumer Group C1 C2 C… C325 Kafka Cluster B1 B2 B3 Broker P1 P2 P… P400 Topic
  • 19. Flurry Monitoring / Error Handling Monitoring www.flurry.com 19 • Alerts • Consumer Failure • Broker Failure Error Handling
  • 20. Next Steps: 0.8 www.flurry.com 20 Data Log Consumer HDFS Kafka Data Log Consumer Kafka Kafka Cluster Broker 1 P0 P2 Broker 2 P1 P3 P1’ P3’ P0’ P2’
  • 21. Next Steps: Extended Pipeline www.flurry.com 21 Input Data NoSQL DataStore Real-Time Batch Collectors Consumer/ Producer Systems MapReduce (jobs) External Action External Action
  • 22. Next Steps: Topics and Consumer Groups Infrastructure and Setup www.flurry.com 22 Consumer Group 2 C1’ C2’ C… CN’ Topic 1 Consumer Group 1 C1 C2 C… CN Consumer Group N C1’’ C2’’ C… CN’’ Topic 2
  • 23. www.flurry.com November 14, 2013 anthony@flurry.com blog.flurry.com @flurrymobile @antwatkins Thank you

×