Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Dominik Gruber, @the_dom
Scala Vienna User Group – April 15, 2015
Apache Kafka
Dominik Gruber • @the_domApache Kafka
Apache Kafka
• Originally developed by LinkedIn
• Open Sourced in 2011
• Written in ...
Dominik Gruber • @the_domApache Kafka
Users
• Everyone, really …
• LinkedIn, Yahoo!, Twitter, Netflix, Square, Spotify,
Pin...
Dominik Gruber • @the_domApache Kafka
Apache Kafka
“A high throughput distributed
messaging system.”
Dominik Gruber • @the_domApache Kafka
Apache Kafka
“Apache Kafka is publish-subscribe
messaging rethought as a
distributed...
Dominik Gruber • @the_domApache Kafka
Apache Kafka
“Kafka is a distributed, partitioned,
replicated commit log service.”
Dominik Gruber • @the_domApache Kafka
Claims
• Fast
• Scalable
• Durable
• Distributed by Design
Dominik Gruber • @the_domApache Kafka
Claims
• Fast
• A single Kafka broker can handle hundreds of
megabytes of reads and ...
Dominik Gruber • @the_domApache Kafka
Claims
• Fast
• Scalable
• Data streams are partitioned and spread over a cluster
of...
Dominik Gruber • @the_domApache Kafka
Claims
• Fast
• Scalable
• Durable
• Messages are persisted on disk and replicated w...
Dominik Gruber • @the_domApache Kafka
Claims
• Fast
• Scalable
• Durable
• Distributed by Design
• Kafka has a modern clus...
Dominik Gruber • @the_domApache Kafka
Design
http://kafka.apache.org/documentation.html
Dominik Gruber • @the_domApache Kafka
Design
http://kafka.apache.org/documentation.html
Dominik Gruber • @the_domApache Kafka
Design
“The performance of linear writes on a JBOD
configuration with six 7200rpm SAT...
Dominik Gruber • @the_domApache Kafka
Design
http://kafka.apache.org/documentation.html
Dominik Gruber • @the_domApache Kafka
Use Cases
“We designed Kafka to be able to act as
a unified platform for handling all...
Dominik Gruber • @the_domApache Kafka
Use Cases
• Messaging
• Website Activity Tracking
• Metrics
• Log Aggregation
• Stre...
Dominik Gruber • @the_domApache Kafka
Demo
Dominik Gruber • @the_domApache Kafka
Q & A
Dominik Gruber • @the_domApache Kafka
Further reading
• http://engineering.linkedin.com/distributed-systems/log-
what-ever...
Upcoming SlideShare
Loading in …5
×

2015-04-15 | Apache Kafka (Vienna Scala User Group)

841 views

Published on

Introduction to Apache Kafka

Published in: Internet
  • Login to see the comments

2015-04-15 | Apache Kafka (Vienna Scala User Group)

  1. 1. Dominik Gruber, @the_dom Scala Vienna User Group – April 15, 2015 Apache Kafka
  2. 2. Dominik Gruber • @the_domApache Kafka Apache Kafka • Originally developed by LinkedIn • Open Sourced in 2011 • Written in Scala • Clients for every popular language • Version 0.8.2.1 • http://kafka.apache.org
  3. 3. Dominik Gruber • @the_domApache Kafka Users • Everyone, really … • LinkedIn, Yahoo!, Twitter, Netflix, Square, Spotify, Pinterest, Uber, Goldman Sachs, Tumblr, PayPal, Box, Airbnb, Mozilla, Cisco, Foursquare,… • https://cwiki.apache.org/confluence/display/ KAFKA/Powered+By
  4. 4. Dominik Gruber • @the_domApache Kafka Apache Kafka “A high throughput distributed messaging system.”
  5. 5. Dominik Gruber • @the_domApache Kafka Apache Kafka “Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.”
  6. 6. Dominik Gruber • @the_domApache Kafka Apache Kafka “Kafka is a distributed, partitioned, replicated commit log service.”
  7. 7. Dominik Gruber • @the_domApache Kafka Claims • Fast • Scalable • Durable • Distributed by Design
  8. 8. Dominik Gruber • @the_domApache Kafka Claims • Fast • A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients. • Scalable • Durable • Distributed by Design
  9. 9. Dominik Gruber • @the_domApache Kafka Claims • Fast • Scalable • Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine (…) • Durable • Distributed by Design
  10. 10. Dominik Gruber • @the_domApache Kafka Claims • Fast • Scalable • Durable • Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact. • Distributed by Design
  11. 11. Dominik Gruber • @the_domApache Kafka Claims • Fast • Scalable • Durable • Distributed by Design • Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.
  12. 12. Dominik Gruber • @the_domApache Kafka Design http://kafka.apache.org/documentation.html
  13. 13. Dominik Gruber • @the_domApache Kafka Design http://kafka.apache.org/documentation.html
  14. 14. Dominik Gruber • @the_domApache Kafka Design “The performance of linear writes on a JBOD configuration with six 7200rpm SATA RAID-5 array is about 600MB/sec but the performance of random writes is only about 100k/sec—a difference of over 6000X.”
  15. 15. Dominik Gruber • @the_domApache Kafka Design http://kafka.apache.org/documentation.html
  16. 16. Dominik Gruber • @the_domApache Kafka Use Cases “We designed Kafka to be able to act as a unified platform for handling all the real-time data feeds a large company might have.”
  17. 17. Dominik Gruber • @the_domApache Kafka Use Cases • Messaging • Website Activity Tracking • Metrics • Log Aggregation • Stream Processing • Event Sourcing • Commit Log
  18. 18. Dominik Gruber • @the_domApache Kafka Demo
  19. 19. Dominik Gruber • @the_domApache Kafka Q & A
  20. 20. Dominik Gruber • @the_domApache Kafka Further reading • http://engineering.linkedin.com/distributed-systems/log- what-every-software-engineer-should-know-about-real- time-datas-unifying • http://blog.confluent.io/2015/04/07/hands-free-kafka- replication-a-lesson-in-operational-simplicity • http://www.slideshare.net/wangxia5/netflix-kafka • https://metamarkets.com/2015/simplicity-stability-and- transparency-how-samza-makes-data-integration-a-breeze

×