Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF
HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH
A...
Guido Schmutz
Working at Trivadis for more than 20 years
Oracle ACE Director for Fusion Middleware and SOA
Consultant, Tra...
Agenda
1. Introduction & Motivation
2. Kafka Core
3. Kafka Connect
4. Kafka Streams
5. Kafka and ”Big Data” / ”Fast Data” ...
Introduction & Motivation
Apache Kafka - Scalable Message Processing and more!4
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Traditional Big Data Architecture
BI	Tools
Enterprise Data
Warehouse
Billi...
Event
Hub
Event
Hub
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Event Hub – handle event stream data
BI	Tools
Enterpri...
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Event Hub – taking Velocity into account
Location
Social
Click
stream
Sens...
Kafka Stream Data Platform
Source:	Confluent
Apache Kafka - Scalable Message Processing and more!8
Kafka Core
Apache Kafka - Scalable Message Processing and more!9
Apache Kafka - Overview
Distributed publish-subscribe messaging system
Designed for processing of real time activity strea...
Apache Kafka - Motivation
LinkedIn’s motivation for Kafka was:
• “A unified platform for handling all the real-time data f...
Kafka High Level Architecture
The who is who
• Producers write data to brokers.
• Consumers read data from
brokers.
• All ...
Apache Kafka - Architecture
Kafka Broker
Movement
Processor
Movement	Topic
Engine-Metrics	Topic
1 2 3 4 5 6
Engine
Process...
Apache Kafka - Architecture
Kafka Broker
Movement
Processor
Movement	Topic
Engine-Metrics	Topic
1 2 3 4 5 6
Engine
Process...
Apache
Kafka
Kafka Broker 1
Movement
Processor
Truck
Movement	Topic
P	0
Movement
Processor
1 2 3 4 5
P	2 1 2 3 4 5
Kafka B...
Apache Kafka - Architecture
• Write Ahead Log / Commit Log
• Producers always append to tail
• think append to file
Kafka ...
Durability Guarantees
Producer can configure acknowledgements
Value Impact Durability
0 • Producer	doesn’t	wait	for	leader...
Apache Kafka - Partition offsets
Offset: messages in the partitions are each assigned a unique (per partition) and
sequent...
Data Retention – 3 options
1. Never
2. Time based (TTL)
log.retention.{ms | minutes | hours}
3. Size based
log.retention.b...
Apache Kafka – Some numbers
Kafka at LinkedIn => over 1800+ broker machines / 79K+ Topics
Kafka Performance at our own inf...
Kafka Topics
Creating a topic
• Command line interface
• Using AdminUtils.createTopic method
• Auto-create via auto.create...
Inspecting the current state of a topic
Use the --describe option
• Leader: brokerID of the currently elected leader broke...
Kafka Connect
Apache Kafka - Scalable Message Processing and more!23
Kafka Connect Architecture
Apache Kafka - Scalable Message Processing and more!24
Source:	Confluent
Kafka Connector Hub – Certified Connectors
Source:	http://www.confluent.io/product/connectors
Apache Kafka - Scalable Mess...
Kafka Connector Hub – Additional Connectors
Source:	http://www.confluent.io/product/connectors
Apache Kafka - Scalable Mes...
Kafka Streams
Apache Kafka - Scalable Message Processing and more!27
Kafka Streams
• Designed as a simple and lightweight library in Apache Kafka
• no external dependencies on systems other t...
Kafka Streams - Architecture
Apache Kafka - Scalable Message Processing and more!29
topology defines the stream
processing...
Kafka Streams - Processor Topology
Apache Kafka - Scalable Message Processing and more!30
topology defines the stream proc...
Kafka and ”Big Data” / ”Fast Data”
Ecosystem
Apache Kafka - Scalable Message Processing and more!31
Kafka and the Big Data / Fast Data ecosystem
Kafka integrates with many popular
products / frameworks
• Apache Spark Strea...
Confluent Platform
Apache Kafka - Scalable Message Processing and more!33
Confluent Data Platform 3.1
Apache Kafka - Scalable Message Processing and more!34
Source:	Confluent
Summary
Apache Kafka - Scalable Message Processing and more!35
Weather
Data
SQL Import
Hadoop Clusterd
Hadoop Cluster
Hadoop Cluster
Location
Social
Click
stream
Sensor
Data
Billing &
O...
Summary
• Kafka can scale to millions of messages per second, and more
• Easy to start with for a PoC
• A bit more to inve...
Guido Schmutz
Technology Manager
guido.schmutz@trivadis.com
Apache Kafka - Scalable Message Processing and more!38
@gschmu...
Upcoming SlideShare
Loading in …5
×

Apache Kafka - Scalable Message-Processing and more !

1,150 views

Published on

Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can me make sure that all these event are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target.
This session will start with an introduction into Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table. Additionally the Kafka ecosystem will be covered as well as the integration of Kafka in the Oracle Stack, with products such as Golden Gate, Service Bus and Oracle Stream Analytics all being able to act as a Kafka consumer or producer.

Published in: Technology
  • Be the first to comment

Apache Kafka - Scalable Message-Processing and more !

  1. 1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH Apache Kafka Scalable Message Processing and more! Guido Schmutz @gschmutz guidoschmutz.wordpress.com
  2. 2. Guido Schmutz Working at Trivadis for more than 20 years Oracle ACE Director for Fusion Middleware and SOA Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Member of Trivadis Architecture Board Technology Manager @ Trivadis More than 30 years of software development experience Contact: guido.schmutz@trivadis.com Blog: http://guidoschmutz.wordpress.com Slideshare: http://www.slideshare.net/gschmutz Twitter: gschmutz 2 8.12.2016 Big Data & Fast Data
  3. 3. Agenda 1. Introduction & Motivation 2. Kafka Core 3. Kafka Connect 4. Kafka Streams 5. Kafka and ”Big Data” / ”Fast Data” Ecosystem 6. Confluent Data Platform 7. Summary Apache Kafka - Scalable Message Processing and more!3
  4. 4. Introduction & Motivation Apache Kafka - Scalable Message Processing and more!4
  5. 5. Hadoop Clusterd Hadoop Cluster Big Data Cluster Traditional Big Data Architecture BI Tools Enterprise Data Warehouse Billing & Ordering CRM / Profile Marketing Campaigns File Import / SQL Import SQL Search Online & Mobile Apps Search NoSQL Parallel Processing Distributed Filesystem • Machine Learning • Graph Algorithms • Natural Language Processing
  6. 6. Event Hub Event Hub Hadoop Clusterd Hadoop Cluster Big Data Cluster Event Hub – handle event stream data BI Tools Enterprise Data Warehouse Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Event Hub Call Center Weather Data Mobile Apps SQL Search Online & Mobile Apps Search Data Flow NoSQL Parallel Processing Distributed Filesystem • Machine Learning • Graph Algorithms • Natural Language Processing
  7. 7. Hadoop Clusterd Hadoop Cluster Big Data Cluster Event Hub – taking Velocity into account Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Batch Analytics Streaming Analytics Event Hub Event Hub Event Hub NoSQL Parallel Processing Distributed Filesystem Stream Analytics NoSQL Reference / Models SQL Search Dashboard BI Tools Enterprise Data Warehouse Search Online & Mobile Apps File Import / SQL Import Weather Data Apache Kafka - Scalable Message Processing and more!7
  8. 8. Kafka Stream Data Platform Source: Confluent Apache Kafka - Scalable Message Processing and more!8
  9. 9. Kafka Core Apache Kafka - Scalable Message Processing and more!9
  10. 10. Apache Kafka - Overview Distributed publish-subscribe messaging system Designed for processing of real time activity stream data (logs, metrics collections, social media streams, …) Initially developed at LinkedIn, now part of Apache Does not use JMS API and standards Kafka maintains feeds of messages in topics Apache Kafka - Scalable Message Processing and more!10
  11. 11. Apache Kafka - Motivation LinkedIn’s motivation for Kafka was: • “A unified platform for handling all the real-time data feeds a large company might have.” Must haves • High throughput to support high volume event feeds. • Support real-time processing of these feeds to create new, derived feeds. • Support large data backlogs to handle periodic ingestion from offline systems. • Support low-latency delivery to handle more traditional messaging use cases. • Guarantee fault-tolerance in the presence of machine failures. Apache Kafka - Scalable Message Processing and more!11
  12. 12. Kafka High Level Architecture The who is who • Producers write data to brokers. • Consumers read data from brokers. • All this is distributed. The data • Data is stored in topics. • Topics are split into partitions, which are replicated. Kafka Cluster Consumer Consumer Consumer Producer Producer Producer Broker 1 Broker 2 Broker 3 Zookeeper Ensemble Apache Kafka - Scalable Message Processing and more!12
  13. 13. Apache Kafka - Architecture Kafka Broker Movement Processor Movement Topic Engine-Metrics Topic 1 2 3 4 5 6 Engine Processor1 2 3 4 5 6 Truck Apache Kafka - Scalable Message Processing and more!13
  14. 14. Apache Kafka - Architecture Kafka Broker Movement Processor Movement Topic Engine-Metrics Topic 1 2 3 4 5 6 Engine Processor Partition 0 1 2 3 4 5 6 Partition 0 1 2 3 4 5 6 Partition 1 Movement Processor Truck Apache Kafka - Scalable Message Processing and more!14
  15. 15. Apache Kafka Kafka Broker 1 Movement Processor Truck Movement Topic P 0 Movement Processor 1 2 3 4 5 P 2 1 2 3 4 5 Kafka Broker 2 Movement Topic P 2 1 2 3 4 5 P 1 1 2 3 4 5 Kafka Broker 3 Movement Topic P 0 1 2 3 4 5 P 1 1 2 3 4 5 Movement Processor Apache Kafka - Scalable Message Processing and more!15
  16. 16. Apache Kafka - Architecture • Write Ahead Log / Commit Log • Producers always append to tail • think append to file Kafka Broker Movement Topic 1 2 3 4 5 Truck 6 6 Apache Kafka - Scalable Message Processing and more!16
  17. 17. Durability Guarantees Producer can configure acknowledgements Value Impact Durability 0 • Producer doesn’t wait for leader weak 1 (default) • Producer waits for leader • Leader sends ack when message written to log • No wait for followers medium all • Producer waits for leader • Leader sends ack when all In-Sync Replica have acknowledged strong Apache Kafka - Scalable Message Processing and more!17
  18. 18. Apache Kafka - Partition offsets Offset: messages in the partitions are each assigned a unique (per partition) and sequential id called the offset • Consumers track their pointers via (offset, partition, topic) tuples Consumer Group A Consumer Group B Apache Kafka - Scalable Message Processing and more!18 Source: Apache Kafka
  19. 19. Data Retention – 3 options 1. Never 2. Time based (TTL) log.retention.{ms | minutes | hours} 3. Size based log.retention.bytes 4. Log compaction based (entries with same key are removed) kafka-topics.sh --zookeeper localhost:2181 --create --topic customers --replication-factor 1 --partitions 1 --config cleanup.policy=compact Apache Kafka - Scalable Message Processing and more!19
  20. 20. Apache Kafka – Some numbers Kafka at LinkedIn => over 1800+ broker machines / 79K+ Topics Kafka Performance at our own infrastructure => 6 brokers (VM) / 1 cluster • 445’622 messages/second • 31 MB / second • 3.0405 ms average latency between producer / consumer 1.3 Trillion messages per day 330 Terabytes in/day 1.2 Petabytes out/day Peak load for a single cluster 2 million messages/sec 4.7 Gigabits/sec inbound 15 Gigabits/sec outbound http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines https://engineering.linkedin.com/kafka/running-kafka-scale Apache Kafka - Scalable Message Processing and more!20
  21. 21. Kafka Topics Creating a topic • Command line interface • Using AdminUtils.createTopic method • Auto-create via auto.create.topics.enable = true Modifying a topic https://kafka.apache.org/documentation.html#basic_ops_modify_topic Deleting a topic • Command Line interface $ kafka-topics.sh –zookeeper zk1:2181 --create --topic my.topic –-partitions 3 –-replication-factor 2 --config x=y Apache Kafka - Scalable Message Processing and more!21
  22. 22. Inspecting the current state of a topic Use the --describe option • Leader: brokerID of the currently elected leader broker • Replica ID’s = broker ID’s • ISR = “in-sync replica”, replicas that are in sync with the leader. In this example: • Broker 0 is leader for partition 1. • Broker 1 is leader for partitions 0 and 2. • All replicas are in-sync with their respective leader partitions. $ kafka-topics.sh –zookeeper zk1:2181 –-describe --topic my.topic Topic:zerg2.hydra PartitionCount:3 ReplicationFactor:2 Configs: Topic: my.topic Partition: 0 Leader: 1 Replicas: 1,0 Isr: 1,0 Topic: my.topic Partition: 1 Leader: 0 Replicas: 0,1 Isr: 0,1 Topic: my.topic Partition: 2 Leader: 1 Replicas: 1,0 Isr: 1,0 Apache Kafka - Scalable Message Processing and more!22
  23. 23. Kafka Connect Apache Kafka - Scalable Message Processing and more!23
  24. 24. Kafka Connect Architecture Apache Kafka - Scalable Message Processing and more!24 Source: Confluent
  25. 25. Kafka Connector Hub – Certified Connectors Source: http://www.confluent.io/product/connectors Apache Kafka - Scalable Message Processing and more!25
  26. 26. Kafka Connector Hub – Additional Connectors Source: http://www.confluent.io/product/connectors Apache Kafka - Scalable Message Processing and more!26
  27. 27. Kafka Streams Apache Kafka - Scalable Message Processing and more!27
  28. 28. Kafka Streams • Designed as a simple and lightweight library in Apache Kafka • no external dependencies on systems other than Apache Kafka • Leverages Kafka as its internal messaging layer • agnostic to resource management and configuration tools • Supports fault-tolerant local state • Event-at-a-time processing (not microbatch) with millisecond latency • Windowing with out-of-order data using a Google DataFlow-like model Apache Kafka - Scalable Message Processing and more!28
  29. 29. Kafka Streams - Architecture Apache Kafka - Scalable Message Processing and more!29 topology defines the stream processing computational logic for your application topology is a graph of stream processors (nodes) that are connected by streams (edges) source processor is a stream processor that does not have any upstream processors sink processor is a special type of stream processor that does not have down-stream processors. Source: Confluent
  30. 30. Kafka Streams - Processor Topology Apache Kafka - Scalable Message Processing and more!30 topology defines the stream processing computational logic for your application topology is a graph of stream processors (nodes) that are connected by streams (edges) source processor is a stream processor that does not have any upstream processors. Consumes one or Kafka topics. sink processor is a special type of stream processor that does not have down-stream processors. Produces to a single Kafka topic. Source: Confluent
  31. 31. Kafka and ”Big Data” / ”Fast Data” Ecosystem Apache Kafka - Scalable Message Processing and more!31
  32. 32. Kafka and the Big Data / Fast Data ecosystem Kafka integrates with many popular products / frameworks • Apache Spark Streaming • Apache Flink • Apache Storm • Apache NiFi • Streamsets • Apache Flume • Oracle Stream Analytics • Oracle Service Bus • Oracle GoldenGate • Spring Integration Kafka Support • …Storm built-in Kafka Spout to consume events from Kafka Apache Kafka - Scalable Message Processing and more!32
  33. 33. Confluent Platform Apache Kafka - Scalable Message Processing and more!33
  34. 34. Confluent Data Platform 3.1 Apache Kafka - Scalable Message Processing and more!34 Source: Confluent
  35. 35. Summary Apache Kafka - Scalable Message Processing and more!35
  36. 36. Weather Data SQL Import Hadoop Clusterd Hadoop Cluster Hadoop Cluster Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Batch Analytics Streaming Analytics Event Hub Event Hub Event Hub NoSQL Parallel Processing Distributed Filesystem Stream Analytics NoSQL Reference / Models SQL Search Dashboard BI Tools Enterprise Data Warehouse Search Online & Mobile Apps Customer Event Hub – mapping of technologies Apache Kafka - Scalable Message Processing and more!36
  37. 37. Summary • Kafka can scale to millions of messages per second, and more • Easy to start with for a PoC • A bit more to invest to setup production environment • Monitoring is key • Vibrant community and ecosystem • Fast pace technology • Confluent provides Kafka Distribution Apache Kafka - Scalable Message Processing and more!37
  38. 38. Guido Schmutz Technology Manager guido.schmutz@trivadis.com Apache Kafka - Scalable Message Processing and more!38 @gschmutz guidoschmutz.wordpress.com

×