Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Kafka short

760 views

Published on

Published in: Software
  • Be the first to comment

  • Be the first to like this

Kafka short

  1. 1. 1 Kafka forKafka for BigDataBigData ProcessingProcessing Yanai Franchi , TikalYanai Franchi , Tikal
  2. 2. 2 Find “Hot” Places
  3. 3. 3
  4. 4. 4 gogobot checkin Heat Map Service Lets' Develop “Gogobot Checkins Heat-Map”
  5. 5. 5 Key Notes ● Collector Service - Collects checkins as text addresses – We need to use GeoLocation ServiceWe need to use GeoLocation Service ● Upon elapsed interval, the last locations list will be displayed as Heat-Map in GUI. ● Web Scale service – 10Ks checkins/seconds all over the world (imaginary, but lets do it for the exercise).
  6. 6. 6 Heat-Map Context Text-Address Checkins Heat-Map Service Gogobot System Gogobot Micro Service Gogobot Micro Service Gogobot Micro Service Geo Location Service Get-GeoCode(Address) Heat-Map Last Interval Locations
  7. 7. 7 Tons of Addresses Arriving Every Second
  8. 8. 8 First Reaction...
  9. 9. 9 Checkin HTTP Reactor Checkins Topic Storm Heat-Map Topology Hotzones Topic Web App Push via WebSocket Publish Checkins HDFS Checkin HTTP Firehose
  10. 10. 10
  11. 11. 11 They all are Good But not for all use-cases
  12. 12. 12 Kafka A little introduction
  13. 13. 13
  14. 14. 14 Why ?
  15. 15. 15 LinkedIn Original Architecture
  16. 16. 16
  17. 17. 17 What LinkedIn Want...
  18. 18. 18 Looks Familiar : Use Messaging (i.e. JMS, RabbitMQ)
  19. 19. 19
  20. 20. 20
  21. 21. 21
  22. 22. 22
  23. 23. 23 It Didn't Scale...
  24. 24. 24 Paradigm Change : Do NOT track message consumption
  25. 25. 25
  26. 26. 26
  27. 27. 27
  28. 28. 28 Stateless Broker & Doesn't Fear the File System
  29. 29. 29 Topics ● Logical collections of partitions (the physical fi les). ● A broker contains some of the partitions for a topic
  30. 30. 30 A partition is Consumed by Exactly One Group's Consumer
  31. 31. 31 Distributed & Fault-Tolerant
  32. 32. 32 Broker 1 Broker 3Broker 2 Zoo Keeper Consumer 1 Consumer 2 Producer 1 Producer 2
  33. 33. 33 Broker 1 Broker 4Broker 3Broker 2 Zoo Keeper Consumer 1 Consumer 2 Producer 1 Producer 2
  34. 34. 34 Broker 1 Broker 4Broker 3Broker 2 Zoo Keeper Consumer 1 Consumer 2 Producer 1 Producer 2
  35. 35. 35 Broker 1 Broker 4Broker 3Broker 2 Zoo Keeper Consumer 1 Consumer 2 Producer 1 Producer 2
  36. 36. 36 Broker 1 Broker 4Broker 3Broker 2 Zoo Keeper Consumer 1 Consumer 2 Producer 1 Producer 2
  37. 37. 37 Broker 1 Broker 4Broker 3Broker 2 Zoo Keeper Consumer 1 Consumer 2 Producer 1 Producer 2
  38. 38. 38 Broker 1 Broker 4Broker 3Broker 2 Zoo Keeper Consumer 1 Consumer 2 Producer 1 Producer 2
  39. 39. 39 Broker 1 Broker 3Broker 2 Zoo Keeper Consumer 1 Consumer 2 Producer 1 Producer 2
  40. 40. 40 Broker 1 Broker 3Broker 2 Zoo Keeper Consumer 1 Consumer 2 Producer 1 Producer 2
  41. 41. 41 Broker 1 Broker 3Broker 2 Zoo Keeper Consumer 1 Consumer 2 Producer 1 Producer 2
  42. 42. 42 Broker 1 Broker 3Broker 2 Zoo Keeper Consumer 1 Producer 1 Producer 2
  43. 43. 43 Broker 1 Broker 3Broker 2 Zoo Keeper Consumer 1 Producer 1 Producer 2
  44. 44. 44 Broker 1 Broker 3Broker 2 Zoo Keeper Consumer 1 Producer 1 Producer 2
  45. 45. 45 Performance Benchmark 1 Broker 1 Producer 1 Consumer
  46. 46. 46
  47. 47. 47
  48. 48. 48 LinkedIn Kafka Performance (2012) ● 8 nodes per datacenter – ~20 GB RAM available for Kafka~20 GB RAM available for Kafka – 6TB storage, RAID 10, basic SATA drives6TB storage, RAID 10, basic SATA drives ● 10 billion messages/day ● Sustained peak: – 172,000 messages/second written172,000 messages/second written – 950,000 messages/second read950,000 messages/second read ● 367 topics ● 40 real-time consumers ● Many ad hoc consumers ● 9.5TB log retained (~ 6 days) ● End-to-end delivery time: A few seconds
  49. 49. 49 Thanks

×