Nyc storm meetup_robdoherty

1,765 views
1,399 views

Published on

Presentation at NYC Storm Meetup #1 on the Kafka-Storm implementation used in production at Outbrain Engage to track thousands of web traffic pings per second.

Published in: Technology, Business
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,765
On SlideShare
0
From Embeds
0
Number of Embeds
48
Actions
Shares
0
Downloads
30
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

Nyc storm meetup_robdoherty

  1. 1. Storm at Rob Doherty Senior Backend Engineer rdoherty@outbrain.com @robdoherty2
  2. 2. What is Outbrain?
  3. 3. Before Storm ● Custom distributed processing system ● Python and ZMQ ● Advantages: ○ Simple components ○ Well-understood ● Disadvantages: ○ Did not scale ○ Batch-processing
  4. 4. Kafka + Storm ● Kafka: high-throughput distributed messaging ● Storm: distributed, real-time computation
  5. 5. Why Kafka? ● Need method to buffer clicks into “stream” ● Kafka + Storm common pattern for click tracking
  6. 6. Why Storm? ● “Real time” (15s latency requirements) ● Fault tolerance ● Easy to manage parallelism ● Stream grouping ● Active community ● Open-source project
  7. 7. Nginx Servers Kafka Cluster Storm Topology Elastic Load Balancer Customer Traffic AWS MongoDB Redis Algo API Architecture
  8. 8. Kafka Cluster ● 40 Producers (8 m1.large instances) ○ Python brod ● 4 Brokers (4 m1.large instances) 10k Clicks per second (peak) 14B Clicks per month Kafka v0.7.2
  9. 9. Storm Topology ● 40 Supervisors (c1.xlarge instances) ● 35 Bolts, 1 Kafka spout ● 250+ Executors (worker threads) 160k+ tuples executed per second Storm v0.82 Leiningen v1.7
  10. 10. Customer Traffic Kafka Spout Aggregate 15s Aggregate 5m Position Customer Social Arrangement Front Page @Handle Storm Topology
  11. 11. Challenges ● Shell Bolts ● Anchor Bolts/Replaying Stream ● Acking Tuples ● Monitoring
  12. 12. Monitoring ● Scribe Logging ● Munin + Nagios ● JMX-JMXTrans + Ganglia ● Storm UI ● Thrift interface into Nimbus + D3
  13. 13. Monitoring ● Scribe Logging ● Munin + Nagios ● JMX-JMXTrans + Ganglia ● Storm UI ● Thrift interface into Nimbus + D3
  14. 14. Future Plans ● Load testing ● Break topology into smaller pieces ● Move from AWS to private data center
  15. 15. Thank you Rob Doherty Senior Backend Engineer rdoherty@outbrain.com @robdoherty2

×