Online Media Data Stream Processing with Kafka

1,796 views

Published on

This talk was held at the third meeting of the Swiss Big Data User Group on September 17 at ETH Zürich.

Published in: Technology, Business
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,796
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Online Media Data Stream Processing with Kafka

  1. 1. CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3
  2. 2. 18. Septem ber 2012•  What is Streaming Data? 2•  Why Kafka?•  Kafka Architecture•  Use Case: Prospective SearchOverview
  3. 3. 18. Septem ber 2012•  Spin-off of MeMo News AG, the 3 leading provider for Social Media Monitoring & Analytics in Switzerland•  Big Data expert, focused on Hadoop, HBase and Solr•  Objective: Transforming data into insightsAbout Sentric
  4. 4. CC 2.0 by audreyjm529| http://flic.kr/p/mNMtL  
  5. 5. 18. Septem ber 2012•  Website Activity Data 5 •  User activity •  Server activity•  Social Media Data•  News Data•  …•  How to Analyze in Real-Time?What is Streaming Data?Data Streams
  6. 6. 18. Septem ber 2012 6 now   t   Offline  (Hadoop/MR)   Online  (Ka5a)  What is Streaming Data?Offline vs. Online
  7. 7. CC 2.0 by Tom Hilton | http://flic.kr/p/54KSXy  
  8. 8. 18. Septem ber 2012•  Message Queues (RabbitMQ, ActiveMQ) 8 •  do not scale / have no persistence•  Flume / Scribe •  Log-Aggregation only, high throughput and scalable, push model •  Focus on offline consumption•  Kafka •  High throughput and scalable, pull model •  Different consumption profilesWhy Kafka?Streaming Systems
  9. 9. 18. Septem ber 2012 9Source:  h<p://research.microso@.com/en-­‐us/um/people/srikanth/netdb11/netdb11papers/netdb11-­‐final12.pdf  Why Kafka?Consumer Performance
  10. 10. CC 2.0 by Presidente | http://flic.kr/p/2ptSZ  
  11. 11. 18. Septem ber 2012•  Messaging System 11•  Publish-Subscribe•  Persistent•  High-ThroughputKafka ArchitectureKey Concepts
  12. 12. 18. Septem ber 2012 12 ZooKeeper Producer Consumer Producer Broker Consumer Producer Push Pull Consumer ProducerKafka ArchitectureMessaging
  13. 13. 18. Septem ber 2012 Topics 13 logs … page-views Msg Msg MsgConsumer Consumer ConsumerKafka ArchitecturePublish-Subscribe
  14. 14. 18. Septem ber 2012•  Persists messages to disc 14 •  Topic is base abstraction •  Binary write ahead log •  No message ID •  Message offset ID (byte position)•  Messages retained a specific time •  Default is 7 daysKafka ArchitecturePersistent
  15. 15. 18. Septem ber 2012•  API Simplicity 15 •  Append message •  Fetch message from given byte position•  Batching•  Stateless Broker•  O(1) disc access (no seeks)•  Use of operating system featuresKafka ArchitectureHigh-Throughput
  16. 16. CC 2.0 by nolifebeforecoffee | http://flic.kr/p/c1UTf
  17. 17. 18. Septem ber 2012 n News Agents 17 Kafka REST RT Alerts Web-UI HBase MySQL Solr Icons by http://dryicons.comProspective SearchSolution Architecture
  18. 18. 18. Septem ber 2012 18 Processing Pull (Batch) Prospective Search RT Alerts Kafka Consumer Icons by http://dryicons.comProspective SearchProspective Search with Kafka
  19. 19. 18. Septem ber 2012•  http://incubator.apache.org/kafka/ 19•  http://sites.computer.org/debull/ A12june/A12JUN-CD.pdfResources to get started
  20. 20. 18. Septem ber 2012 20 Questions? Christian Gügi, christian.guegi@sentric.chSwiss Big Data User GroupThank you!

×