• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Nyc storm meetup_robdoherty
 

Nyc storm meetup_robdoherty

on

  • 1,114 views

Presentation at NYC Storm Meetup #1 on the Kafka-Storm implementation used in production at Outbrain Engage to track thousands of web traffic pings per second.

Presentation at NYC Storm Meetup #1 on the Kafka-Storm implementation used in production at Outbrain Engage to track thousands of web traffic pings per second.

Statistics

Views

Total Views
1,114
Views on SlideShare
1,109
Embed Views
5

Actions

Likes
4
Downloads
25
Comments
0

2 Embeds 5

https://www.linkedin.com 3
http://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Nyc storm meetup_robdoherty Nyc storm meetup_robdoherty Presentation Transcript

    • Storm at Rob Doherty Senior Backend Engineer rdoherty@outbrain.com @robdoherty2
    • What is Outbrain?
    • Before Storm ● Custom distributed processing system ● Python and ZMQ ● Advantages: ○ Simple components ○ Well-understood ● Disadvantages: ○ Did not scale ○ Batch-processing
    • Kafka + Storm ● Kafka: high-throughput distributed messaging ● Storm: distributed, real-time computation
    • Why Kafka? ● Need method to buffer clicks into “stream” ● Kafka + Storm common pattern for click tracking
    • Why Storm? ● “Real time” (15s latency requirements) ● Fault tolerance ● Easy to manage parallelism ● Stream grouping ● Active community ● Open-source project
    • Nginx Servers Kafka Cluster Storm Topology Elastic Load Balancer Customer Traffic AWS MongoDB Redis Algo API Architecture
    • Kafka Cluster ● 40 Producers (8 m1.large instances) ○ Python brod ● 4 Brokers (4 m1.large instances) 10k Clicks per second (peak) 14B Clicks per month Kafka v0.7.2
    • Storm Topology ● 40 Supervisors (c1.xlarge instances) ● 35 Bolts, 1 Kafka spout ● 250+ Executors (worker threads) 160k+ tuples executed per second Storm v0.82 Leiningen v1.7
    • Customer Traffic Kafka Spout Aggregate 15s Aggregate 5m Position Customer Social Arrangement Front Page @Handle Storm Topology
    • Challenges ● Shell Bolts ● Anchor Bolts/Replaying Stream ● Acking Tuples ● Monitoring
    • Monitoring ● Scribe Logging ● Munin + Nagios ● JMX-JMXTrans + Ganglia ● Storm UI ● Thrift interface into Nimbus + D3
    • Monitoring ● Scribe Logging ● Munin + Nagios ● JMX-JMXTrans + Ganglia ● Storm UI ● Thrift interface into Nimbus + D3
    • Future Plans ● Load testing ● Break topology into smaller pieces ● Move from AWS to private data center
    • Thank you Rob Doherty Senior Backend Engineer rdoherty@outbrain.com @robdoherty2