Nyc storm meetup_robdoherty
Upcoming SlideShare
Loading in...5
×
 

Nyc storm meetup_robdoherty

on

  • 1,277 views

Presentation at NYC Storm Meetup #1 on the Kafka-Storm implementation used in production at Outbrain Engage to track thousands of web traffic pings per second.

Presentation at NYC Storm Meetup #1 on the Kafka-Storm implementation used in production at Outbrain Engage to track thousands of web traffic pings per second.

Statistics

Views

Total Views
1,277
Views on SlideShare
1,267
Embed Views
10

Actions

Likes
5
Downloads
25
Comments
0

2 Embeds 10

https://www.linkedin.com 8
http://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Nyc storm meetup_robdoherty Nyc storm meetup_robdoherty Presentation Transcript

  • Storm at Rob Doherty Senior Backend Engineer rdoherty@outbrain.com @robdoherty2
  • What is Outbrain?
  • Before Storm ● Custom distributed processing system ● Python and ZMQ ● Advantages: ○ Simple components ○ Well-understood ● Disadvantages: ○ Did not scale ○ Batch-processing
  • Kafka + Storm ● Kafka: high-throughput distributed messaging ● Storm: distributed, real-time computation
  • Why Kafka? ● Need method to buffer clicks into “stream” ● Kafka + Storm common pattern for click tracking
  • Why Storm? ● “Real time” (15s latency requirements) ● Fault tolerance ● Easy to manage parallelism ● Stream grouping ● Active community ● Open-source project
  • Nginx Servers Kafka Cluster Storm Topology Elastic Load Balancer Customer Traffic AWS MongoDB Redis Algo API Architecture
  • Kafka Cluster ● 40 Producers (8 m1.large instances) ○ Python brod ● 4 Brokers (4 m1.large instances) 10k Clicks per second (peak) 14B Clicks per month Kafka v0.7.2
  • Storm Topology ● 40 Supervisors (c1.xlarge instances) ● 35 Bolts, 1 Kafka spout ● 250+ Executors (worker threads) 160k+ tuples executed per second Storm v0.82 Leiningen v1.7
  • Customer Traffic Kafka Spout Aggregate 15s Aggregate 5m Position Customer Social Arrangement Front Page @Handle Storm Topology
  • Challenges ● Shell Bolts ● Anchor Bolts/Replaying Stream ● Acking Tuples ● Monitoring
  • Monitoring ● Scribe Logging ● Munin + Nagios ● JMX-JMXTrans + Ganglia ● Storm UI ● Thrift interface into Nimbus + D3
  • Monitoring ● Scribe Logging ● Munin + Nagios ● JMX-JMXTrans + Ganglia ● Storm UI ● Thrift interface into Nimbus + D3
  • Future Plans ● Load testing ● Break topology into smaller pieces ● Move from AWS to private data center
  • Thank you Rob Doherty Senior Backend Engineer rdoherty@outbrain.com @robdoherty2