0
Storm at
Rob Doherty
Senior Backend Engineer
rdoherty@outbrain.com
@robdoherty2
What is Outbrain?
Before Storm
● Custom distributed processing system
● Python and ZMQ
● Advantages:
○ Simple components
○ Well-understood
●...
Kafka + Storm
● Kafka: high-throughput distributed messaging
● Storm: distributed, real-time computation
Why Kafka?
● Need method to buffer clicks into “stream”
● Kafka + Storm common pattern for click tracking
Why Storm?
● “Real time” (15s latency requirements)
● Fault tolerance
● Easy to manage parallelism
● Stream grouping
● Act...
Nginx Servers
Kafka Cluster
Storm Topology
Elastic Load Balancer
Customer
Traffic
AWS
MongoDB
Redis
Algo
API
Architecture
Kafka Cluster
● 40 Producers (8 m1.large instances)
○ Python brod
● 4 Brokers (4 m1.large instances)
10k Clicks per second...
Storm Topology
● 40 Supervisors (c1.xlarge instances)
● 35 Bolts, 1 Kafka spout
● 250+ Executors (worker threads)
160k+ tu...
Customer
Traffic
Kafka Spout Aggregate 15s
Aggregate 5m
Position
Customer
Social
Arrangement
Front Page
@Handle
Storm Topo...
Challenges
● Shell Bolts
● Anchor Bolts/Replaying Stream
● Acking Tuples
● Monitoring
Monitoring
● Scribe Logging
● Munin + Nagios
● JMX-JMXTrans + Ganglia
● Storm UI
● Thrift interface into Nimbus + D3
Monitoring
● Scribe Logging
● Munin + Nagios
● JMX-JMXTrans + Ganglia
● Storm UI
● Thrift interface into Nimbus + D3
Future Plans
● Load testing
● Break topology into smaller pieces
● Move from AWS to private data center
Thank you
Rob Doherty
Senior Backend Engineer
rdoherty@outbrain.com
@robdoherty2
Nyc storm meetup_robdoherty
Nyc storm meetup_robdoherty
Nyc storm meetup_robdoherty
Nyc storm meetup_robdoherty
Upcoming SlideShare
Loading in...5
×

Nyc storm meetup_robdoherty

1,058

Published on

Presentation at NYC Storm Meetup #1 on the Kafka-Storm implementation used in production at Outbrain Engage to track thousands of web traffic pings per second.

Published in: Technology, Business
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,058
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
28
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

Transcript of "Nyc storm meetup_robdoherty"

  1. 1. Storm at Rob Doherty Senior Backend Engineer rdoherty@outbrain.com @robdoherty2
  2. 2. What is Outbrain?
  3. 3. Before Storm ● Custom distributed processing system ● Python and ZMQ ● Advantages: ○ Simple components ○ Well-understood ● Disadvantages: ○ Did not scale ○ Batch-processing
  4. 4. Kafka + Storm ● Kafka: high-throughput distributed messaging ● Storm: distributed, real-time computation
  5. 5. Why Kafka? ● Need method to buffer clicks into “stream” ● Kafka + Storm common pattern for click tracking
  6. 6. Why Storm? ● “Real time” (15s latency requirements) ● Fault tolerance ● Easy to manage parallelism ● Stream grouping ● Active community ● Open-source project
  7. 7. Nginx Servers Kafka Cluster Storm Topology Elastic Load Balancer Customer Traffic AWS MongoDB Redis Algo API Architecture
  8. 8. Kafka Cluster ● 40 Producers (8 m1.large instances) ○ Python brod ● 4 Brokers (4 m1.large instances) 10k Clicks per second (peak) 14B Clicks per month Kafka v0.7.2
  9. 9. Storm Topology ● 40 Supervisors (c1.xlarge instances) ● 35 Bolts, 1 Kafka spout ● 250+ Executors (worker threads) 160k+ tuples executed per second Storm v0.82 Leiningen v1.7
  10. 10. Customer Traffic Kafka Spout Aggregate 15s Aggregate 5m Position Customer Social Arrangement Front Page @Handle Storm Topology
  11. 11. Challenges ● Shell Bolts ● Anchor Bolts/Replaying Stream ● Acking Tuples ● Monitoring
  12. 12. Monitoring ● Scribe Logging ● Munin + Nagios ● JMX-JMXTrans + Ganglia ● Storm UI ● Thrift interface into Nimbus + D3
  13. 13. Monitoring ● Scribe Logging ● Munin + Nagios ● JMX-JMXTrans + Ganglia ● Storm UI ● Thrift interface into Nimbus + D3
  14. 14. Future Plans ● Load testing ● Break topology into smaller pieces ● Move from AWS to private data center
  15. 15. Thank you Rob Doherty Senior Backend Engineer rdoherty@outbrain.com @robdoherty2
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×