Flurry Analytic Backend - Processing Terabytes of Data in Real-time

4,138 views
3,843 views

Published on

0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,138
On SlideShare
0
From Embeds
0
Number of Embeds
1,547
Actions
Shares
0
Downloads
48
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide

Flurry Analytic Backend - Processing Terabytes of Data in Real-time

  1. 1. www.flurry.com November 14, 2013 Anthony Watkins, Senior Director of Developer Relations Processing Terabytes of Data in Real- Time @flurrymobile @antwatkins
  2. 2. www.flurry.com
  3. 3. Flurry is a leading mobile advertising and analytics provider Publisher Advertiser Audience AppCircle Applications: 10,000+ Devices/month: 300M Conversions/month: 120M AppSpot Applications: 2,500+ Devices/month: 250M Impressions/month: 7.5B Analytics Applications: 400,000 Devices/month: 1.2B Data points/month: 1.9T
  4. 4. • Why Flurry Switched from a MapReduce Framework to pipeline processing • How Flurry uses Kafka in data processing • Tuning of Kafka to work in Flurry’s environment • Flurry Monitoring and error handling of streams Topics The Path to Real-Time Processing www.flurry.com 4
  5. 5. The Why www.flurry.com 5
  6. 6. Past Processing Model www.flurry.com 6 Device Reports NoSQL DataStore Batch Collectors MapReduce (jobs) External Action
  7. 7. Flurry Analytics MapReduce Architecture www.flurry.com 7 Agent Portal Data Log Processor Developer Portal Metrics Computer HDFS HBase HBase Hadoop/Hbase Jetty Jetty HTTP Binary Encoded Data Raw Data Log Archive Metrics Table (Cube) Normalized Data Storage User Profile Data MySQL Hadoop Map/Reduce Hadoop Map/Reduce Web Layer Metrics Processing
  8. 8. Data Collection and Processing in MR Pros www.flurry.com 8 MapReduce (jobs)
  9. 9. Data Collection and Processing in MR Cons www.flurry.com 9 Device Reports MapReduce (jobs) Job Time Startup Time
  10. 10. Flurry Kafka The Move to Kafka www.flurry.com 10
  11. 11. About Kafka Origin www.flurry.com 11 November 2010 June 2011 November 2012
  12. 12. About Kafka www.flurry.com 12 Producer ProducerProducer Kakfa Broker Consumer Consumer Consumer
  13. 13. About Kafka www.flurry.com 13 Kafka Broker * * Partition image courtesy of http://kafka.apache.org/images/log_anatomy.png
  14. 14. About Kafka www.flurry.com 14 Producer 1 Producer NProducer 2 Kafka Cluster Broker 1 P0 P2 Broker 2 P1 P3 Consumer Group C1 C2 C3
  15. 15. Why Kafka for Flurry www.flurry.com 15 Device Reports MapReduce (jobs) Kafka Startup Time
  16. 16. Introducing the Data Log Consumer (DLC) www.flurry.com 16 Agent Portal Data Log Consumer Developer Portal Metrics Computer HDFS HBase HBase Hadoop/Hbase Jetty Jetty HTTP Binary Encoded Data Metrics Table (Cube) Normalized Data Storage User Profile Data MySQL Kafka Hadoop Map/Reduce Web Layer Metrics Processing
  17. 17. • Zookeeper timeouts • Completely async service • Default fsync interval • Commit threshold from local environments Tuning Kafka for Flurry Challenges www.flurry.com 17
  18. 18. How Flurry Uses Kafka Infrastructure and Setup www.flurry.com 18 Consumer Group C1 C2 C… C325 Kafka Cluster B1 B2 B3 Broker P1 P2 P… P400 Topic
  19. 19. Flurry Monitoring / Error Handling Monitoring www.flurry.com 19 • Alerts • Consumer Failure • Broker Failure Error Handling
  20. 20. Next Steps: 0.8 www.flurry.com 20 Data Log Consumer HDFS Kafka Data Log Consumer Kafka Kafka Cluster Broker 1 P0 P2 Broker 2 P1 P3 P1’ P3’ P0’ P2’
  21. 21. Next Steps: Extended Pipeline www.flurry.com 21 Input Data NoSQL DataStore Real-Time Batch Collectors Consumer/ Producer Systems MapReduce (jobs) External Action External Action
  22. 22. Next Steps: Topics and Consumer Groups Infrastructure and Setup www.flurry.com 22 Consumer Group 2 C1’ C2’ C… CN’ Topic 1 Consumer Group 1 C1 C2 C… CN Consumer Group N C1’’ C2’’ C… CN’’ Topic 2
  23. 23. www.flurry.com November 14, 2013 anthony@flurry.com blog.flurry.com @flurrymobile @antwatkins Thank you

×