Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Upcoming SlideShare
Loading in...5
×
 

Flurry Analytic Backend - Processing Terabytes of Data in Real-time

on

  • 784 views

 

Statistics

Views

Total Views
784
Slideshare-icon Views on SlideShare
608
Embed Views
176

Actions

Likes
3
Downloads
16
Comments
0

2 Embeds 176

http://www.scoop.it 169
http://www.slideee.com 7

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Flurry Analytic Backend - Processing Terabytes of Data in Real-time Flurry Analytic Backend - Processing Terabytes of Data in Real-time Presentation Transcript

    • www.flurry.com November 14, 2013 Anthony Watkins, Senior Director of Developer Relations Processing Terabytes of Data in Real- Time @flurrymobile @antwatkins
    • www.flurry.com
    • Flurry is a leading mobile advertising and analytics provider Publisher Advertiser Audience AppCircle Applications: 10,000+ Devices/month: 300M Conversions/month: 120M AppSpot Applications: 2,500+ Devices/month: 250M Impressions/month: 7.5B Analytics Applications: 400,000 Devices/month: 1.2B Data points/month: 1.9T
    • • Why Flurry Switched from a MapReduce Framework to pipeline processing • How Flurry uses Kafka in data processing • Tuning of Kafka to work in Flurry’s environment • Flurry Monitoring and error handling of streams Topics The Path to Real-Time Processing www.flurry.com 4
    • The Why www.flurry.com 5
    • Past Processing Model www.flurry.com 6 Device Reports NoSQL DataStore Batch Collectors MapReduce (jobs) External Action
    • Flurry Analytics MapReduce Architecture www.flurry.com 7 Agent Portal Data Log Processor Developer Portal Metrics Computer HDFS HBase HBase Hadoop/Hbase Jetty Jetty HTTP Binary Encoded Data Raw Data Log Archive Metrics Table (Cube) Normalized Data Storage User Profile Data MySQL Hadoop Map/Reduce Hadoop Map/Reduce Web Layer Metrics Processing
    • Data Collection and Processing in MR Pros www.flurry.com 8 MapReduce (jobs)
    • Data Collection and Processing in MR Cons www.flurry.com 9 Device Reports MapReduce (jobs) Job Time Startup Time
    • Flurry Kafka The Move to Kafka www.flurry.com 10
    • About Kafka Origin www.flurry.com 11 November 2010 June 2011 November 2012
    • About Kafka www.flurry.com 12 Producer ProducerProducer Kakfa Broker Consumer Consumer Consumer
    • About Kafka www.flurry.com 13 Kafka Broker * * Partition image courtesy of http://kafka.apache.org/images/log_anatomy.png
    • About Kafka www.flurry.com 14 Producer 1 Producer NProducer 2 Kafka Cluster Broker 1 P0 P2 Broker 2 P1 P3 Consumer Group C1 C2 C3
    • Why Kafka for Flurry www.flurry.com 15 Device Reports MapReduce (jobs) Kafka Startup Time
    • Introducing the Data Log Consumer (DLC) www.flurry.com 16 Agent Portal Data Log Consumer Developer Portal Metrics Computer HDFS HBase HBase Hadoop/Hbase Jetty Jetty HTTP Binary Encoded Data Metrics Table (Cube) Normalized Data Storage User Profile Data MySQL Kafka Hadoop Map/Reduce Web Layer Metrics Processing
    • • Zookeeper timeouts • Completely async service • Default fsync interval • Commit threshold from local environments Tuning Kafka for Flurry Challenges www.flurry.com 17
    • How Flurry Uses Kafka Infrastructure and Setup www.flurry.com 18 Consumer Group C1 C2 C… C325 Kafka Cluster B1 B2 B3 Broker P1 P2 P… P400 Topic
    • Flurry Monitoring / Error Handling Monitoring www.flurry.com 19 • Alerts • Consumer Failure • Broker Failure Error Handling
    • Next Steps: 0.8 www.flurry.com 20 Data Log Consumer HDFS Kafka Data Log Consumer Kafka Kafka Cluster Broker 1 P0 P2 Broker 2 P1 P3 P1’ P3’ P0’ P2’
    • Next Steps: Extended Pipeline www.flurry.com 21 Input Data NoSQL DataStore Real-Time Batch Collectors Consumer/ Producer Systems MapReduce (jobs) External Action External Action
    • Next Steps: Topics and Consumer Groups Infrastructure and Setup www.flurry.com 22 Consumer Group 2 C1’ C2’ C… CN’ Topic 1 Consumer Group 1 C1 C2 C… CN Consumer Group N C1’’ C2’’ C… CN’’ Topic 2
    • www.flurry.com November 14, 2013 anthony@flurry.com blog.flurry.com @flurrymobile @antwatkins Thank you