• Save
Online Media Data Stream Processing with Kafka
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Online Media Data Stream Processing with Kafka

on

  • 1,948 views

This talk was held at the third meeting of the Swiss Big Data User Group on September 17 at ETH Zürich.

This talk was held at the third meeting of the Swiss Big Data User Group on September 17 at ETH Zürich.

Statistics

Views

Total Views
1,948
Views on SlideShare
1,948
Embed Views
0

Actions

Likes
3
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Online Media Data Stream Processing with Kafka Presentation Transcript

  • 1. CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3
  • 2. 18. Septem ber 2012•  What is Streaming Data? 2•  Why Kafka?•  Kafka Architecture•  Use Case: Prospective SearchOverview
  • 3. 18. Septem ber 2012•  Spin-off of MeMo News AG, the 3 leading provider for Social Media Monitoring & Analytics in Switzerland•  Big Data expert, focused on Hadoop, HBase and Solr•  Objective: Transforming data into insightsAbout Sentric
  • 4. CC 2.0 by audreyjm529| http://flic.kr/p/mNMtL  
  • 5. 18. Septem ber 2012•  Website Activity Data 5 •  User activity •  Server activity•  Social Media Data•  News Data•  …•  How to Analyze in Real-Time?What is Streaming Data?Data Streams
  • 6. 18. Septem ber 2012 6 now   t   Offline  (Hadoop/MR)   Online  (Ka5a)  What is Streaming Data?Offline vs. Online
  • 7. CC 2.0 by Tom Hilton | http://flic.kr/p/54KSXy  
  • 8. 18. Septem ber 2012•  Message Queues (RabbitMQ, ActiveMQ) 8 •  do not scale / have no persistence•  Flume / Scribe •  Log-Aggregation only, high throughput and scalable, push model •  Focus on offline consumption•  Kafka •  High throughput and scalable, pull model •  Different consumption profilesWhy Kafka?Streaming Systems
  • 9. 18. Septem ber 2012 9Source:  h<p://research.microso@.com/en-­‐us/um/people/srikanth/netdb11/netdb11papers/netdb11-­‐final12.pdf  Why Kafka?Consumer Performance
  • 10. CC 2.0 by Presidente | http://flic.kr/p/2ptSZ  
  • 11. 18. Septem ber 2012•  Messaging System 11•  Publish-Subscribe•  Persistent•  High-ThroughputKafka ArchitectureKey Concepts
  • 12. 18. Septem ber 2012 12 ZooKeeper Producer Consumer Producer Broker Consumer Producer Push Pull Consumer ProducerKafka ArchitectureMessaging
  • 13. 18. Septem ber 2012 Topics 13 logs … page-views Msg Msg MsgConsumer Consumer ConsumerKafka ArchitecturePublish-Subscribe
  • 14. 18. Septem ber 2012•  Persists messages to disc 14 •  Topic is base abstraction •  Binary write ahead log •  No message ID •  Message offset ID (byte position)•  Messages retained a specific time •  Default is 7 daysKafka ArchitecturePersistent
  • 15. 18. Septem ber 2012•  API Simplicity 15 •  Append message •  Fetch message from given byte position•  Batching•  Stateless Broker•  O(1) disc access (no seeks)•  Use of operating system featuresKafka ArchitectureHigh-Throughput
  • 16. CC 2.0 by nolifebeforecoffee | http://flic.kr/p/c1UTf
  • 17. 18. Septem ber 2012 n News Agents 17 Kafka REST RT Alerts Web-UI HBase MySQL Solr Icons by http://dryicons.comProspective SearchSolution Architecture
  • 18. 18. Septem ber 2012 18 Processing Pull (Batch) Prospective Search RT Alerts Kafka Consumer Icons by http://dryicons.comProspective SearchProspective Search with Kafka
  • 19. 18. Septem ber 2012•  http://incubator.apache.org/kafka/ 19•  http://sites.computer.org/debull/ A12june/A12JUN-CD.pdfResources to get started
  • 20. 18. Septem ber 2012 20 Questions? Christian Gügi, christian.guegi@sentric.chSwiss Big Data User GroupThank you!