Open analytics meetup alex poon (1)

749 views

Published on

Visual Revenue's p

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
749
On SlideShare
0
From Embeds
0
Number of Embeds
19
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Open analytics meetup alex poon (1)

  1. 1. Storm @ Visual Revenue (an OutbrainCompany)Alex PoonVP of Engineering
  2. 2. Who are we?
  3. 3. What we do? Customer
Traffic
•  14B page views per month•  At peak, 8000-10000 per sec Web
Servers
•  Deployed Storm to production ~ 1 Ka=a
month ago Data
Transform/ Aggrega8on
•  Storm cluster of ~50 instances on Storm
AWS Databases
 Dashboard
 Algo
 Automa8on

  4. 4. Before Storm•  Built our own distributed data processing •  ZMQ •  Batch based process •  Hashing processing by customers•  Advantages •  Simple in-house system built from very basic components •  Well understood•  Disadvantages •  Hard to scale, constant battle for keeping up with pings •  Machine management was clumsy •  Uneven distribution of traffic •  Multiple processes doing similar work, wasting resources
  5. 5. Why Kafka/Storm?•  Kafka •  open-sourced, distributed publish-subscribe messaging system•  Storm •  open-sourced, real-time computation system for continuous computation•  They are awesome •  Distributed, highly scalable, and fault tolerance •  High throughput •  Reliable •  Real-time •  Great at in-memory analytics, and real-time decision support
  6. 6. DataAggregation Customer
 15s
 Position
 Front Page
 15s
 15s
URL
 Aggregate
 15s
 Aggregate
 Arrangement
 5m
 5m
Spout
 Tweet
 @Handle
Bolt
 15s
 15s

  7. 7. Learning / Ideas1. Kafka + zookeeper is extremely scalable and easy to setup.Check out the Brod library if you are doing Python2. Use the Storm UI (Ganglia based) to monitor your cluster3. Shell Bolts were inefficient and hard to debug (at least for us)4. Upgrade to at least Storm version 0.8.2 which gives you capacitymetrics on top of other goodies5. Storm’s anchoring/replay capability is awesome but comes with avisible overhead6. Use a good framework to manage your cluster, we use Salt Stack7. Our unit tests are built in Junit. Most built in unit tests for Stormare only available in Clojure for now
  8. 8. Thank You Alex Poon @alexpoon06 @Outbrain Yes, it is true. We are Hiring!! 
 www.visualrevenue.com/jobs


×