Apache kafka-a distributed streaming platform

816 views

Published on

Paolo Castagna is a Senior Sales Engineer at Confluent. His background is on 'big data' and he has, first hand, saw the shift happening in the industry from batch to stream processing and from big data to fast data. His talk will introduce Kafka Streams and explain why Apache Kafka is a great option and simplification for stream processing.

Published in: Software
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
816
On SlideShare
0
From Embeds
0
Number of Embeds
26
Actions
Shares
0
Downloads
29
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Apache kafka-a distributed streaming platform

  1. 1. Apache Kafka A Distributed Streaming Platform StreamProcessing.be - Belgium Wednesday, 18th January 2017 < paolo @ confluent.io >
  2. 2. https://www.confluent.io/blog/stream-data-platform-1/ Industry shift from Big Data to Fast Data and Stream Processing
  3. 3. $ cat < in.txt | grep “apache” | tr a-z A-Z > out.txt Apache Kafka APIs and UNIX analogy
  4. 4. $ cat < in.txt | grep “apache” | tr a-z A-Z > out.txt Connect APIs Apache Kafka APIs and UNIX analogy
  5. 5. $ cat < in.txt | grep “apache” | tr a-z A-Z > out.txt Producer/Consumer APIs Apache Kafka APIs and UNIX analogy
  6. 6. $ cat < in.txt | grep “apache” | tr a-z A-Z > out.txt Streams APIs Apache Kafka APIs and UNIX analogy
  7. 7. Streams APIs part of Apache Kafka http://kafka.apache.org/documentation/streams
  8. 8. Build applications, not clusters <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-streams</artifactId> <version>0.10.1.1</version> </dependency>
  9. 9. Spot the difference(s)
  10. 10. How do I run in production?
  11. 11. How do I run in production? As any other Java applications...
  12. 12. How do I run in production? Uncool Cool
  13. 13. Typical High Level Architecture
  14. 14. Typical High Level Architecture Real-time Data Ingestion
  15. 15. Typical High Level Architecture Stream Processing Storage Real-time Data Ingestion
  16. 16. Typical High Level Architecture Data Publishing / Visualization Stream Processing Storage Real-time Data Ingestion
  17. 17. How many clusters do you count? NoSQL (Cassandra, HBase, Couchbase, MongoDB, …) or Elasticsearch, Solr, … Storm, Flink, Spark Streaming, Ignite, Akka Streams, Apex, … HDFS, NFS, Ceph, GlusterFS, Lustre, ... Apache Kafka
  18. 18. Simplicity is the ultimate sophistication Apache Kafka Distributed Streaming Platform Publish & Subscribe to streams of data like a messaging system Store streams of data safely in a distributed replicated cluster Process streams of data efficiently and in real-time Node.js
  19. 19. Apache Kafka and Streams APIs benefits • Build applications, not clusters • Native integration with Apacke Kafka • Elastic, fast, distributed, fault-tolerant, secure • Scalable: S, M, L, XL, XXL • Run everywhere: from containers to cloud • Streams (with KStream) and tables (with KTable) • Local state replicated to Kafka for fault-tolerance • Windowing and event time semantics out of the box • Supports late-arriving and out-of-order events
  20. 20. Apache Kafka adoption across the industry… … everybody loves simplicity!
  21. 21. References • http://kafka.apache.org/ • http://kafka.apache.org/documentation/streams • http://docs.confluent.io/ • http://docs.confluent.io/current/streams/ • http://blog.confluent.io/ • http://github.com/confluentinc/examples • http://github.com/apache/kafka/tree/trunk/streams
  22. 22. References
  23. 23. The easiest way to get you started https://www.confluent.io/download/
  24. 24. SIMPLICITY WE
  25. 25. YOUR FEEDBACK!
  26. 26. Discount code: kafcom17 ‪Use the Apache Kafka community discount code to get $50 off ‪www.kafka-summit.org Kafka Summit New York: May 8 Kafka Summit San Francisco: August 28 Presented by

×