Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
Vsp 41 config_max
Next
Download to read offline and view in fullscreen.

6

Share

Download to read offline

Nyc storm meetup_robdoherty

Download to read offline

Presentation at NYC Storm Meetup #1 on the Kafka-Storm implementation used in production at Outbrain Engage to track thousands of web traffic pings per second.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Nyc storm meetup_robdoherty

  1. 1. Storm at Rob Doherty Senior Backend Engineer rdoherty@outbrain.com @robdoherty2
  2. 2. What is Outbrain?
  3. 3. Before Storm ● Custom distributed processing system ● Python and ZMQ ● Advantages: ○ Simple components ○ Well-understood ● Disadvantages: ○ Did not scale ○ Batch-processing
  4. 4. Kafka + Storm ● Kafka: high-throughput distributed messaging ● Storm: distributed, real-time computation
  5. 5. Why Kafka? ● Need method to buffer clicks into “stream” ● Kafka + Storm common pattern for click tracking
  6. 6. Why Storm? ● “Real time” (15s latency requirements) ● Fault tolerance ● Easy to manage parallelism ● Stream grouping ● Active community ● Open-source project
  7. 7. Nginx Servers Kafka Cluster Storm Topology Elastic Load Balancer Customer Traffic AWS MongoDB Redis Algo API Architecture
  8. 8. Kafka Cluster ● 40 Producers (8 m1.large instances) ○ Python brod ● 4 Brokers (4 m1.large instances) 10k Clicks per second (peak) 14B Clicks per month Kafka v0.7.2
  9. 9. Storm Topology ● 40 Supervisors (c1.xlarge instances) ● 35 Bolts, 1 Kafka spout ● 250+ Executors (worker threads) 160k+ tuples executed per second Storm v0.82 Leiningen v1.7
  10. 10. Customer Traffic Kafka Spout Aggregate 15s Aggregate 5m Position Customer Social Arrangement Front Page @Handle Storm Topology
  11. 11. Challenges ● Shell Bolts ● Anchor Bolts/Replaying Stream ● Acking Tuples ● Monitoring
  12. 12. Monitoring ● Scribe Logging ● Munin + Nagios ● JMX-JMXTrans + Ganglia ● Storm UI ● Thrift interface into Nimbus + D3
  13. 13. Monitoring ● Scribe Logging ● Munin + Nagios ● JMX-JMXTrans + Ganglia ● Storm UI ● Thrift interface into Nimbus + D3
  14. 14. Future Plans ● Load testing ● Break topology into smaller pieces ● Move from AWS to private data center
  15. 15. Thank you Rob Doherty Senior Backend Engineer rdoherty@outbrain.com @robdoherty2
  • ssuserde8176

    Nov. 15, 2014
  • KumarNirmal

    Aug. 23, 2014
  • chaoh

    Oct. 21, 2013
  • NirmalKumar39

    Sep. 26, 2013
  • edvorkin

    Aug. 31, 2013
  • venkateshvijay

    Aug. 27, 2013

Presentation at NYC Storm Meetup #1 on the Kafka-Storm implementation used in production at Outbrain Engage to track thousands of web traffic pings per second.

Views

Total views

2,156

On Slideshare

0

From embeds

0

Number of embeds

49

Actions

Downloads

32

Shares

0

Comments

0

Likes

6

×