CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3
18.                                    Septem                                      ber                                    ...
18.                                           Septem                                             ber                      ...
CC 2.0 by audreyjm529| http://flic.kr/p/mNMtL	  
18.                                  Septem                                    ber                                   2012•...
18.                                                                        Septem                                         ...
CC 2.0 by Tom Hilton | http://flic.kr/p/54KSXy	  
18.                                                         Septem                                                        ...
18.                                                                                                                   Sept...
CC 2.0 by Presidente | http://flic.kr/p/2ptSZ	  
18.                           Septem                             ber                            2012•      Messaging Syste...
18.                                                          Septem                                                       ...
18.                                                     Septem                                                       ber  ...
18.                                               Septem                                                 ber              ...
18.                                                  Septem                                                    ber        ...
CC 2.0 by nolifebeforecoffee | http://flic.kr/p/c1UTf
18.                                                                             Septem                                    ...
18.                                                                                       Septem                          ...
18.                                        Septem                                          ber                            ...
18.                                                        Septem                                                         ...
Upcoming SlideShare
Loading in...5
×

Online Media Data Stream Processing with Kafka

1,405

Published on

This talk was held at the third meeting of the Swiss Big Data User Group on September 17 at ETH Zürich.

Published in: Technology, Business
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,405
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Online Media Data Stream Processing with Kafka

  1. 1. CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3
  2. 2. 18. Septem ber 2012•  What is Streaming Data? 2•  Why Kafka?•  Kafka Architecture•  Use Case: Prospective SearchOverview
  3. 3. 18. Septem ber 2012•  Spin-off of MeMo News AG, the 3 leading provider for Social Media Monitoring & Analytics in Switzerland•  Big Data expert, focused on Hadoop, HBase and Solr•  Objective: Transforming data into insightsAbout Sentric
  4. 4. CC 2.0 by audreyjm529| http://flic.kr/p/mNMtL  
  5. 5. 18. Septem ber 2012•  Website Activity Data 5 •  User activity •  Server activity•  Social Media Data•  News Data•  …•  How to Analyze in Real-Time?What is Streaming Data?Data Streams
  6. 6. 18. Septem ber 2012 6 now   t   Offline  (Hadoop/MR)   Online  (Ka5a)  What is Streaming Data?Offline vs. Online
  7. 7. CC 2.0 by Tom Hilton | http://flic.kr/p/54KSXy  
  8. 8. 18. Septem ber 2012•  Message Queues (RabbitMQ, ActiveMQ) 8 •  do not scale / have no persistence•  Flume / Scribe •  Log-Aggregation only, high throughput and scalable, push model •  Focus on offline consumption•  Kafka •  High throughput and scalable, pull model •  Different consumption profilesWhy Kafka?Streaming Systems
  9. 9. 18. Septem ber 2012 9Source:  h<p://research.microso@.com/en-­‐us/um/people/srikanth/netdb11/netdb11papers/netdb11-­‐final12.pdf  Why Kafka?Consumer Performance
  10. 10. CC 2.0 by Presidente | http://flic.kr/p/2ptSZ  
  11. 11. 18. Septem ber 2012•  Messaging System 11•  Publish-Subscribe•  Persistent•  High-ThroughputKafka ArchitectureKey Concepts
  12. 12. 18. Septem ber 2012 12 ZooKeeper Producer Consumer Producer Broker Consumer Producer Push Pull Consumer ProducerKafka ArchitectureMessaging
  13. 13. 18. Septem ber 2012 Topics 13 logs … page-views Msg Msg MsgConsumer Consumer ConsumerKafka ArchitecturePublish-Subscribe
  14. 14. 18. Septem ber 2012•  Persists messages to disc 14 •  Topic is base abstraction •  Binary write ahead log •  No message ID •  Message offset ID (byte position)•  Messages retained a specific time •  Default is 7 daysKafka ArchitecturePersistent
  15. 15. 18. Septem ber 2012•  API Simplicity 15 •  Append message •  Fetch message from given byte position•  Batching•  Stateless Broker•  O(1) disc access (no seeks)•  Use of operating system featuresKafka ArchitectureHigh-Throughput
  16. 16. CC 2.0 by nolifebeforecoffee | http://flic.kr/p/c1UTf
  17. 17. 18. Septem ber 2012 n News Agents 17 Kafka REST RT Alerts Web-UI HBase MySQL Solr Icons by http://dryicons.comProspective SearchSolution Architecture
  18. 18. 18. Septem ber 2012 18 Processing Pull (Batch) Prospective Search RT Alerts Kafka Consumer Icons by http://dryicons.comProspective SearchProspective Search with Kafka
  19. 19. 18. Septem ber 2012•  http://incubator.apache.org/kafka/ 19•  http://sites.computer.org/debull/ A12june/A12JUN-CD.pdfResources to get started
  20. 20. 18. Septem ber 2012 20 Questions? Christian Gügi, christian.guegi@sentric.chSwiss Big Data User GroupThank you!

×