Experience with Kafka & Storm

  • 1,294 views
Uploaded on

Experience with Kafka & Storm by Otto Mok

Experience with Kafka & Storm by Otto Mok

More in: Technology , Design
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,294
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
40
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Target and Connect Intelligently Experience with Kafka & Storm Otto Mok Solution Architect, AcuityAds April 30, 2014 – Toronto Hadoop User Group
  • 2. 2 Agenda • Background – What does AcuityAds do? • Use case – What are we trying to do? • High-level System Architecture – How does the data flow? • Kafka & Storm – What did we do wrong?
  • 3. 3 Background Source: https://www.google.ca/search?q=banner+ads&tbm=isch&tbo=u
  • 4. 4 Background • Digital Advertising – Website banner, pre-roll video, free mobile app • Buy ad impressions at ‘real-time’ – Response within 50ms for auction • Find best match between people and ads – Show ad that you care about • Use machine learning algo to ‘learn’ – Data, data, data
  • 5. 5 Use case • 10+ billion daily impressions • 30,000+ new sites daily • How many daily impressions by site? • How are the impressions distributed? – Country, Province, Gender, Age Range, etc...
  • 6. 6 High-level System Architecture • 10+ billion daily bid requests • Make up to 4 billion daily bids • Serve millions of daily impressions • 10+ TB of messages daily • 300k+ message / second Bidder Adserver Kafka Hbase/Hadoop Storm
  • 7. 7 Kafka Source: http://kafka.apache.org/documentation.html
  • 8. 8 Kafka - Spec • Kafka v0.8.0 • Servers – 10 x 2U(10 x 3TB) JBOD • Total storage – 300 TB • Replication – 3x • Unique data – 100 TB • Capacity – a few days • Producer acknowledgment – never waits • Topic - BIDREQUEST
  • 9. 9 Kafka - Monitoring • Nagios – Ping, CPU, memory, network I/O, disk space • Producer-Consumer group message counting – Hourly consumption rate check Topic Consumer Group ID Producer Count Consumer Count Error Ratio BIDREQUEST InventoryTopology 122,450,812 122,444,294 None 1.00 BIDREQUEST SearchTargetingTopology 122,450,812 107,755,295 Ratio below 98% 0.88
  • 10. 10 Kafka - Monitoring • Kafka Web Console – Partition offset for each consumer group
  • 11. 11 Kafka - Issues • Issue 1 - Partitions – 10 partitions – Each partition > 1 TB a day – 100 TB / 1 TB – no problem! • Each partition is stored in a directory – /disk05/kafka-logs/BIDREQUEST-09 – /disk09/kafka-logs/BIDREQUEST-03
  • 12. 12 Kafka - Issues • Issue 2 – Unbalanced partition distribution – Some servers running out of space – Some servers are not “leader” for any partition • Network glitch cause server to drop out of cluster, no longer leader after rejoin • auto.leader.rebalance.enable=true
  • 13. 13 Lots of data – now what? Source: http://bookriotcom.c.presscdn.com/wp-content/uploads/2013/03/server-farm-shot.jpg
  • 14. 14 Use case - again • 10+ billion daily impressions • 30,000+ new sites daily • How many daily impressions by site? • How are the impressions distributed? – Country, Province, Gender, Age Range, etc...
  • 15. 15 Storm Source: http://storm.incubator.apache.org/documentation/Tutorial.html
  • 16. 16 Storm - Spec • Storm v0.8.2 • Servers – 13 x Dual Quad Core Xeon 36G RAM • 4 worker slots per server • Total logical CPUs – 208 • Total memory – 468 G • Total slots – 52 worker slots (JVMs)
  • 17. 17 Storm - Monitor
  • 18. 18 Storm - Topology • Spout read each BidRequest from Kafka topic • Determine new or existing, emit tuples to different “streams”
  • 19. 19 Storm - Topology • InsertInventoryBolt – Process tuples from NewInventory stream – Field grouping on sourceId, domainName – Tick tuple every 1 second • UpdateInventoryBolt – Process tuples from ExistingInventory stream – Field grouping on inventoryId – Tick tuple every 1 second
  • 20. 20 Storm - Topology • LogInventoryBolt – Process tuples from ExistingInventory stream – Field grouping on inventoryId – Tick tuple every 10 seconds
  • 21. 21 Storm - Issues • Issue – Low uptime – 10 workers, 100 executors – Not processing many tuples – Process latency < 10ms • Bolts restarts due to uncaught Exceptions
  • 22. 22 Conclusion • Cost – Bleed edge technology  bugs – Support  mailing lists – Monitoring  roll your own – Operation  dedicated personnel • Benefit – Near real-time data on site impression volume & distribution by geo, demo, etc...
  • 23. 23 Forward Looking • Kafka v0.8.1.1 – Allow specify broker hostname for producer & consumer – Change # of partitions of a topic online • Storm v0.9.1 – Faster pure Java Netty transport – View logs from each server from Storm UI – Tick tuple using floating point seconds – Storm on Hadoop (HDP 2.1)
  • 24. 24 Thank you Otto Mok otto.mok@acuityads.com Source: http://jamesgieordano.files.wordpress.com/2011/05/babyelephant.jpg