Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Nyc storm meetup_robdoherty

2,090 views

Published on

Presentation at NYC Storm Meetup #1 on the Kafka-Storm implementation used in production at Outbrain Engage to track thousands of web traffic pings per second.

Published in: Technology, Business
  • Be the first to comment

Nyc storm meetup_robdoherty

  1. 1. Storm at Rob Doherty Senior Backend Engineer rdoherty@outbrain.com @robdoherty2
  2. 2. What is Outbrain?
  3. 3. Before Storm ● Custom distributed processing system ● Python and ZMQ ● Advantages: ○ Simple components ○ Well-understood ● Disadvantages: ○ Did not scale ○ Batch-processing
  4. 4. Kafka + Storm ● Kafka: high-throughput distributed messaging ● Storm: distributed, real-time computation
  5. 5. Why Kafka? ● Need method to buffer clicks into “stream” ● Kafka + Storm common pattern for click tracking
  6. 6. Why Storm? ● “Real time” (15s latency requirements) ● Fault tolerance ● Easy to manage parallelism ● Stream grouping ● Active community ● Open-source project
  7. 7. Nginx Servers Kafka Cluster Storm Topology Elastic Load Balancer Customer Traffic AWS MongoDB Redis Algo API Architecture
  8. 8. Kafka Cluster ● 40 Producers (8 m1.large instances) ○ Python brod ● 4 Brokers (4 m1.large instances) 10k Clicks per second (peak) 14B Clicks per month Kafka v0.7.2
  9. 9. Storm Topology ● 40 Supervisors (c1.xlarge instances) ● 35 Bolts, 1 Kafka spout ● 250+ Executors (worker threads) 160k+ tuples executed per second Storm v0.82 Leiningen v1.7
  10. 10. Customer Traffic Kafka Spout Aggregate 15s Aggregate 5m Position Customer Social Arrangement Front Page @Handle Storm Topology
  11. 11. Challenges ● Shell Bolts ● Anchor Bolts/Replaying Stream ● Acking Tuples ● Monitoring
  12. 12. Monitoring ● Scribe Logging ● Munin + Nagios ● JMX-JMXTrans + Ganglia ● Storm UI ● Thrift interface into Nimbus + D3
  13. 13. Monitoring ● Scribe Logging ● Munin + Nagios ● JMX-JMXTrans + Ganglia ● Storm UI ● Thrift interface into Nimbus + D3
  14. 14. Future Plans ● Load testing ● Break topology into smaller pieces ● Move from AWS to private data center
  15. 15. Thank you Rob Doherty Senior Backend Engineer rdoherty@outbrain.com @robdoherty2

×