www.flurry.com
November 14, 2013
Anthony Watkins, Senior Director of Developer Relations
Processing Terabytes of Data in R...
www.flurry.com
Flurry is a leading mobile advertising and analytics provider
Publisher
Advertiser
Audience
AppCircle
Applications: 10,000...
• Why Flurry Switched from a MapReduce Framework to
pipeline processing
• How Flurry uses Kafka in data processing
• Tunin...
The Why
www.flurry.com 5
Past Processing Model
www.flurry.com 6
Device Reports
NoSQL DataStore
Batch
Collectors
MapReduce
(jobs)
External
Action
Flurry Analytics MapReduce Architecture
www.flurry.com 7
Agent Portal
Data Log Processor
Developer
Portal
Metrics Computer...
Data Collection and Processing in MR
Pros
www.flurry.com 8
MapReduce
(jobs)
Data Collection and Processing in MR
Cons
www.flurry.com 9
Device Reports
MapReduce
(jobs)
Job Time
Startup Time
Flurry Kafka
The Move to Kafka
www.flurry.com 10
About Kafka
Origin
www.flurry.com 11
November 2010 June 2011 November 2012
About Kafka
www.flurry.com 12
Producer ProducerProducer
Kakfa Broker
Consumer Consumer Consumer
About Kafka
www.flurry.com 13
Kafka Broker
*
* Partition image courtesy of http://kafka.apache.org/images/log_anatomy.png
About Kafka
www.flurry.com 14
Producer 1 Producer NProducer 2
Kafka Cluster
Broker 1
P0 P2
Broker 2
P1 P3
Consumer Group
C...
Why Kafka for Flurry
www.flurry.com 15
Device Reports
MapReduce
(jobs)
Kafka
Startup
Time
Introducing the Data Log Consumer (DLC)
www.flurry.com 16
Agent Portal
Data Log Consumer
Developer
Portal
Metrics Computer...
• Zookeeper timeouts
• Completely async service
• Default fsync interval
• Commit threshold from local environments
Tuning...
How Flurry Uses Kafka
Infrastructure and Setup
www.flurry.com 18
Consumer Group
C1 C2 C… C325
Kafka Cluster
B1 B2 B3
Broke...
Flurry Monitoring / Error Handling
Monitoring
www.flurry.com 19
• Alerts
• Consumer Failure
• Broker Failure
Error Handling
Next Steps: 0.8
www.flurry.com 20
Data Log Consumer
HDFS
Kafka
Data Log Consumer
Kafka
Kafka Cluster
Broker 1
P0 P2
Broker...
Next Steps: Extended Pipeline
www.flurry.com 21
Input Data
NoSQL DataStore
Real-Time Batch
Collectors
Consumer/
Producer
S...
Next Steps: Topics and Consumer Groups
Infrastructure and Setup
www.flurry.com 22
Consumer Group 2
C1’ C2’ C… CN’
Topic 1
...
www.flurry.com
November 14, 2013
anthony@flurry.com
blog.flurry.com
@flurrymobile
@antwatkins
Thank you
Upcoming SlideShare
Loading in...5
×

Flurry Analytic Backend - Processing Terabytes of Data in Real-time

2,465

Published on

0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,465
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
40
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide

Flurry Analytic Backend - Processing Terabytes of Data in Real-time

  1. 1. www.flurry.com November 14, 2013 Anthony Watkins, Senior Director of Developer Relations Processing Terabytes of Data in Real- Time @flurrymobile @antwatkins
  2. 2. www.flurry.com
  3. 3. Flurry is a leading mobile advertising and analytics provider Publisher Advertiser Audience AppCircle Applications: 10,000+ Devices/month: 300M Conversions/month: 120M AppSpot Applications: 2,500+ Devices/month: 250M Impressions/month: 7.5B Analytics Applications: 400,000 Devices/month: 1.2B Data points/month: 1.9T
  4. 4. • Why Flurry Switched from a MapReduce Framework to pipeline processing • How Flurry uses Kafka in data processing • Tuning of Kafka to work in Flurry’s environment • Flurry Monitoring and error handling of streams Topics The Path to Real-Time Processing www.flurry.com 4
  5. 5. The Why www.flurry.com 5
  6. 6. Past Processing Model www.flurry.com 6 Device Reports NoSQL DataStore Batch Collectors MapReduce (jobs) External Action
  7. 7. Flurry Analytics MapReduce Architecture www.flurry.com 7 Agent Portal Data Log Processor Developer Portal Metrics Computer HDFS HBase HBase Hadoop/Hbase Jetty Jetty HTTP Binary Encoded Data Raw Data Log Archive Metrics Table (Cube) Normalized Data Storage User Profile Data MySQL Hadoop Map/Reduce Hadoop Map/Reduce Web Layer Metrics Processing
  8. 8. Data Collection and Processing in MR Pros www.flurry.com 8 MapReduce (jobs)
  9. 9. Data Collection and Processing in MR Cons www.flurry.com 9 Device Reports MapReduce (jobs) Job Time Startup Time
  10. 10. Flurry Kafka The Move to Kafka www.flurry.com 10
  11. 11. About Kafka Origin www.flurry.com 11 November 2010 June 2011 November 2012
  12. 12. About Kafka www.flurry.com 12 Producer ProducerProducer Kakfa Broker Consumer Consumer Consumer
  13. 13. About Kafka www.flurry.com 13 Kafka Broker * * Partition image courtesy of http://kafka.apache.org/images/log_anatomy.png
  14. 14. About Kafka www.flurry.com 14 Producer 1 Producer NProducer 2 Kafka Cluster Broker 1 P0 P2 Broker 2 P1 P3 Consumer Group C1 C2 C3
  15. 15. Why Kafka for Flurry www.flurry.com 15 Device Reports MapReduce (jobs) Kafka Startup Time
  16. 16. Introducing the Data Log Consumer (DLC) www.flurry.com 16 Agent Portal Data Log Consumer Developer Portal Metrics Computer HDFS HBase HBase Hadoop/Hbase Jetty Jetty HTTP Binary Encoded Data Metrics Table (Cube) Normalized Data Storage User Profile Data MySQL Kafka Hadoop Map/Reduce Web Layer Metrics Processing
  17. 17. • Zookeeper timeouts • Completely async service • Default fsync interval • Commit threshold from local environments Tuning Kafka for Flurry Challenges www.flurry.com 17
  18. 18. How Flurry Uses Kafka Infrastructure and Setup www.flurry.com 18 Consumer Group C1 C2 C… C325 Kafka Cluster B1 B2 B3 Broker P1 P2 P… P400 Topic
  19. 19. Flurry Monitoring / Error Handling Monitoring www.flurry.com 19 • Alerts • Consumer Failure • Broker Failure Error Handling
  20. 20. Next Steps: 0.8 www.flurry.com 20 Data Log Consumer HDFS Kafka Data Log Consumer Kafka Kafka Cluster Broker 1 P0 P2 Broker 2 P1 P3 P1’ P3’ P0’ P2’
  21. 21. Next Steps: Extended Pipeline www.flurry.com 21 Input Data NoSQL DataStore Real-Time Batch Collectors Consumer/ Producer Systems MapReduce (jobs) External Action External Action
  22. 22. Next Steps: Topics and Consumer Groups Infrastructure and Setup www.flurry.com 22 Consumer Group 2 C1’ C2’ C… CN’ Topic 1 Consumer Group 1 C1 C2 C… CN Consumer Group N C1’’ C2’’ C… CN’’ Topic 2
  23. 23. www.flurry.com November 14, 2013 anthony@flurry.com blog.flurry.com @flurrymobile @antwatkins Thank you
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×