Flurry Analytic Backend - Processing Terabytes of Data in Real-time

  • 1,253 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,253
On Slideshare
0
From Embeds
0
Number of Embeds
4

Actions

Shares
Downloads
28
Comments
0
Likes
6

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. www.flurry.com November 14, 2013 Anthony Watkins, Senior Director of Developer Relations Processing Terabytes of Data in Real- Time @flurrymobile @antwatkins
  • 2. www.flurry.com
  • 3. Flurry is a leading mobile advertising and analytics provider Publisher Advertiser Audience AppCircle Applications: 10,000+ Devices/month: 300M Conversions/month: 120M AppSpot Applications: 2,500+ Devices/month: 250M Impressions/month: 7.5B Analytics Applications: 400,000 Devices/month: 1.2B Data points/month: 1.9T
  • 4. • Why Flurry Switched from a MapReduce Framework to pipeline processing • How Flurry uses Kafka in data processing • Tuning of Kafka to work in Flurry’s environment • Flurry Monitoring and error handling of streams Topics The Path to Real-Time Processing www.flurry.com 4
  • 5. The Why www.flurry.com 5
  • 6. Past Processing Model www.flurry.com 6 Device Reports NoSQL DataStore Batch Collectors MapReduce (jobs) External Action
  • 7. Flurry Analytics MapReduce Architecture www.flurry.com 7 Agent Portal Data Log Processor Developer Portal Metrics Computer HDFS HBase HBase Hadoop/Hbase Jetty Jetty HTTP Binary Encoded Data Raw Data Log Archive Metrics Table (Cube) Normalized Data Storage User Profile Data MySQL Hadoop Map/Reduce Hadoop Map/Reduce Web Layer Metrics Processing
  • 8. Data Collection and Processing in MR Pros www.flurry.com 8 MapReduce (jobs)
  • 9. Data Collection and Processing in MR Cons www.flurry.com 9 Device Reports MapReduce (jobs) Job Time Startup Time
  • 10. Flurry Kafka The Move to Kafka www.flurry.com 10
  • 11. About Kafka Origin www.flurry.com 11 November 2010 June 2011 November 2012
  • 12. About Kafka www.flurry.com 12 Producer ProducerProducer Kakfa Broker Consumer Consumer Consumer
  • 13. About Kafka www.flurry.com 13 Kafka Broker * * Partition image courtesy of http://kafka.apache.org/images/log_anatomy.png
  • 14. About Kafka www.flurry.com 14 Producer 1 Producer NProducer 2 Kafka Cluster Broker 1 P0 P2 Broker 2 P1 P3 Consumer Group C1 C2 C3
  • 15. Why Kafka for Flurry www.flurry.com 15 Device Reports MapReduce (jobs) Kafka Startup Time
  • 16. Introducing the Data Log Consumer (DLC) www.flurry.com 16 Agent Portal Data Log Consumer Developer Portal Metrics Computer HDFS HBase HBase Hadoop/Hbase Jetty Jetty HTTP Binary Encoded Data Metrics Table (Cube) Normalized Data Storage User Profile Data MySQL Kafka Hadoop Map/Reduce Web Layer Metrics Processing
  • 17. • Zookeeper timeouts • Completely async service • Default fsync interval • Commit threshold from local environments Tuning Kafka for Flurry Challenges www.flurry.com 17
  • 18. How Flurry Uses Kafka Infrastructure and Setup www.flurry.com 18 Consumer Group C1 C2 C… C325 Kafka Cluster B1 B2 B3 Broker P1 P2 P… P400 Topic
  • 19. Flurry Monitoring / Error Handling Monitoring www.flurry.com 19 • Alerts • Consumer Failure • Broker Failure Error Handling
  • 20. Next Steps: 0.8 www.flurry.com 20 Data Log Consumer HDFS Kafka Data Log Consumer Kafka Kafka Cluster Broker 1 P0 P2 Broker 2 P1 P3 P1’ P3’ P0’ P2’
  • 21. Next Steps: Extended Pipeline www.flurry.com 21 Input Data NoSQL DataStore Real-Time Batch Collectors Consumer/ Producer Systems MapReduce (jobs) External Action External Action
  • 22. Next Steps: Topics and Consumer Groups Infrastructure and Setup www.flurry.com 22 Consumer Group 2 C1’ C2’ C… CN’ Topic 1 Consumer Group 1 C1 C2 C… CN Consumer Group N C1’’ C2’’ C… CN’’ Topic 2
  • 23. www.flurry.com November 14, 2013 anthony@flurry.com blog.flurry.com @flurrymobile @antwatkins Thank you