Online Media Data Stream Processing with Kafka

  • 1,285 views
Uploaded on

This talk was held at the third meeting of the Swiss Big Data User Group on September 17 at ETH Zürich.

This talk was held at the third meeting of the Swiss Big Data User Group on September 17 at ETH Zürich.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,285
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3
  • 2. 18. Septem ber 2012•  What is Streaming Data? 2•  Why Kafka?•  Kafka Architecture•  Use Case: Prospective SearchOverview
  • 3. 18. Septem ber 2012•  Spin-off of MeMo News AG, the 3 leading provider for Social Media Monitoring & Analytics in Switzerland•  Big Data expert, focused on Hadoop, HBase and Solr•  Objective: Transforming data into insightsAbout Sentric
  • 4. CC 2.0 by audreyjm529| http://flic.kr/p/mNMtL  
  • 5. 18. Septem ber 2012•  Website Activity Data 5 •  User activity •  Server activity•  Social Media Data•  News Data•  …•  How to Analyze in Real-Time?What is Streaming Data?Data Streams
  • 6. 18. Septem ber 2012 6 now   t   Offline  (Hadoop/MR)   Online  (Ka5a)  What is Streaming Data?Offline vs. Online
  • 7. CC 2.0 by Tom Hilton | http://flic.kr/p/54KSXy  
  • 8. 18. Septem ber 2012•  Message Queues (RabbitMQ, ActiveMQ) 8 •  do not scale / have no persistence•  Flume / Scribe •  Log-Aggregation only, high throughput and scalable, push model •  Focus on offline consumption•  Kafka •  High throughput and scalable, pull model •  Different consumption profilesWhy Kafka?Streaming Systems
  • 9. 18. Septem ber 2012 9Source:  h<p://research.microso@.com/en-­‐us/um/people/srikanth/netdb11/netdb11papers/netdb11-­‐final12.pdf  Why Kafka?Consumer Performance
  • 10. CC 2.0 by Presidente | http://flic.kr/p/2ptSZ  
  • 11. 18. Septem ber 2012•  Messaging System 11•  Publish-Subscribe•  Persistent•  High-ThroughputKafka ArchitectureKey Concepts
  • 12. 18. Septem ber 2012 12 ZooKeeper Producer Consumer Producer Broker Consumer Producer Push Pull Consumer ProducerKafka ArchitectureMessaging
  • 13. 18. Septem ber 2012 Topics 13 logs … page-views Msg Msg MsgConsumer Consumer ConsumerKafka ArchitecturePublish-Subscribe
  • 14. 18. Septem ber 2012•  Persists messages to disc 14 •  Topic is base abstraction •  Binary write ahead log •  No message ID •  Message offset ID (byte position)•  Messages retained a specific time •  Default is 7 daysKafka ArchitecturePersistent
  • 15. 18. Septem ber 2012•  API Simplicity 15 •  Append message •  Fetch message from given byte position•  Batching•  Stateless Broker•  O(1) disc access (no seeks)•  Use of operating system featuresKafka ArchitectureHigh-Throughput
  • 16. CC 2.0 by nolifebeforecoffee | http://flic.kr/p/c1UTf
  • 17. 18. Septem ber 2012 n News Agents 17 Kafka REST RT Alerts Web-UI HBase MySQL Solr Icons by http://dryicons.comProspective SearchSolution Architecture
  • 18. 18. Septem ber 2012 18 Processing Pull (Batch) Prospective Search RT Alerts Kafka Consumer Icons by http://dryicons.comProspective SearchProspective Search with Kafka
  • 19. 18. Septem ber 2012•  http://incubator.apache.org/kafka/ 19•  http://sites.computer.org/debull/ A12june/A12JUN-CD.pdfResources to get started
  • 20. 18. Septem ber 2012 20 Questions? Christian Gügi, christian.guegi@sentric.chSwiss Big Data User GroupThank you!