SlideShare a Scribd company logo
1 of 40
SPRINGONE2GX
WASHINGTON, DC
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Developing Real-Time Data
Pipelines with Apache Kafka
Joe Stein
@allthingshadoop
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
CEO of Elodina http://www.elodina.net/ a big data as a service platform built on top open source
software. The Elodina platform enables customers to analyze data streams and programmatically
react to the results in real-time. We solve today’s data analytics needs by providing the tools and
support necessary to utilize open source technologies. As users, contributors and committers,
Elodina also provides support for frameworks that run on Mesos including Apache Kafka, Exhibitor
(Zookeeper), Apache Storm, Apache Cassandra and a whole lot more!
Apache Kafka Committer & PMC Member
LinkedIn: http://linkedin.com/in/charmalloc
Twitter : @allthingshadoop
whoami
2
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Contents
• Introduction To Kafka
• Overview
• Topics, Partitions & Segments
• Data Durability
• Replication
• Producers
• Consumers
• Performance
• Integration
• Quick Start
• Operations
3
• Designs
• Distributed RPC
o Request
o Process
o Response
• Storage & Analytics
o Stream
o Transform
o Analyze
o Store
o Search
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Apache Kafka
4
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Apache Kafka
Apache Kafka was first open sourced by LinkedIn in 2011
Papers
● Building a Replicated Logging System with Apache Kafka http://www.vldb.org/pvldb/vol8/p1654-wang.pdf
● Kafka: A Distributed Messaging System for Log Processing http://research.microsoft.com/en-
us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
● Building LinkedIn’s Real-time Activity Data Pipeline http://sites.computer.org/debull/A12june/pipeline.pdf
● The Log: What Every Software Engineer Should Know About Real-time Data's Unifying Abstraction
http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
http://kafka.apache.org/
5
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
How Big Data Usually Starts
6
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
More Big Data!
7
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Ah!
8
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
eesh
9
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Kafka de-couples data pipelines
10
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Distributed Replicated Log
Read and write
In real time
As much as you want
As fast as your network can go
11
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Topics and Partitions
12
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Log Segments
13
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Distributed Replicated Log
14
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Data Durability
15
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Replication
16
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Producers
17
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Consumers
18
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Consumer Failover
19
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Producer Performance
20
https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Consumer Performance
http://kafka.apache.org/documentation.html#maximizingefficiency
21
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Client Libraries
Community Clients https://cwiki.apache.org/confluence/display/KAFKA/Clients
● Go (aka golang) Pure Go implementation with full protocol support. Consumer and Producer
implementations included, GZIP and Snappy compression supported.
● Python - Pure Python implementation with full protocol support. Consumer and Producer
implementations included, GZIP and Snappy compression supported.
● C - High performance C library with full protocol support
● Ruby - Pure Ruby, Consumer and Producer implementations included, GZIP and Snappy
compression supported. Ruby 1.9.3 and up (CI runs MRI 2.
● Clojure - Clojure DSL for the Kafka API
● JavaScript (NodeJS) - NodeJS client in a pure JavaScript implementation
Wire Protocol Developer's Guide
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
22
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Spring Integration
Good blog about it https://spring.io/blog/2015/04/15/using-apache-kafka-for-
integration-and-data-processing-pipelines-with-spring
Kafka Integration Source https://github.com/spring-projects/spring-integration-
kafka
Spring XD samples
https://github.com/spring-projects/spring-xd-samples/tree/master/kafka-source
23
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Quick Start
https://kafka.apache.org/documentation.html#quickstart
Download the 0.8.2.2 release and un-tar it.
> tar -xzf kafka_2.10-0.8.2.2.tgz
> cd kafka_2.10-0.8.2.2
(use at least four terminal windows)
> bin/zookeeper-server-start.sh config/zookeeper.properties
> bin/kafka-server-start.sh config/server.properties
> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
This is a message
This is another message
> bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
This is a message
This is another message
24
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Operationalizing Kafka
https://kafka.apache.org/documentation.html#basic_ops
Basic Kafka Operations
● Adding and removing topics
● Modifying topics
● Graceful shutdown
● Balancing leadership
● Checking consumer position
● Mirroring data between clusters
● Expanding your cluster
● Decommissioning brokers
● Increasing replication factor
25
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Running on Mesos
26
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Static Partitioning
27
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Scaling is manual (even if orchestrated)
28
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Static failures require manual intervention
29
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Application Elasticity
30
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
An operating system for your data center
31
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Everything goes on Mesos
32
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Kafka on Mesos
https://github.com/mesos/kafka
● smart broker.id assignment.
● preservation of broker placement (through constraints and/or new features).
● ability to-do configuration changes.
● rolling restarts (for things like configuration changes).
● scaling the cluster up and down with automatic, programmatic and manual
options.
● smart partition assignment via constraints visa vi roles, resources and
attributes.
33
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Kafka on Mesos
Scheduler
● Provides the operational automation for a Kafka Cluster.
● Manages the changes to the broker's configuration.
● Exposes a REST API for the CLI to use or any other client.
● Runs on Marathon for high availability.
Executor
● The executor interacts with the kafka broker as an intermediary to the scheduler
34
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
REST API & CLI
● scheduler - starts the scheduler.
● add - adds one more more brokers to the cluster.
● update - changes resources, constraints or broker properties one or more brokers.
● remove - take a broker out of the cluster.
● start - starts a broker up.
● stop - this can either a graceful shutdown or will force kill it (./kafka-mesos.sh help stop)
● rebalance - allows you to rebalance a cluster either by selecting the brokers or topics to rebalance. Manual
assignment is still possible using the Apache Kafka project tools. Rebalance can also change the replication factor
on a topic.
● help - ./kafka-mesos.sh help || ./kafka-mesos.sh help {command}
35
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Launch 20 brokers in seconds
36
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Kafka 0.9
KIP (Kafka Improvement Process)
• https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
New Consumer
• https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design
Security
• https://cwiki.apache.org/confluence/display/KAFKA/Security
• https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=51809888
JIRA
• https://issues.apache.org/jira/browse/KAFKA/fixforversion/12328745/?selectedTab=com.atlassian.jira
.jira-projects-plugin:version-issues-panel
37
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Distributed RPC
38
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Reference Architecture
39
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Questions?
http://www.elodina.net
40

More Related Content

What's hot

Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services
 
fluentd -- the missing log collector
fluentd -- the missing log collectorfluentd -- the missing log collector
fluentd -- the missing log collectorMuga Nishizawa
 
Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)W2O Group
 
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingSpark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingJack Gudenkauf
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Data Con LA
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streamingdatamantra
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupGwen (Chen) Shapira
 
Stream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaAbhinav Singh
 
Using the flipn stack for edge ai (flink, nifi, pulsar)
Using the flipn stack for edge ai (flink, nifi, pulsar)Using the flipn stack for edge ai (flink, nifi, pulsar)
Using the flipn stack for edge ai (flink, nifi, pulsar)Timothy Spann
 
Zoo keeper in the wild
Zoo keeper in the wildZoo keeper in the wild
Zoo keeper in the wilddatamantra
 
Using FLiP with influxdb for edgeai iot at scale 2022
Using FLiP with influxdb for edgeai iot at scale 2022Using FLiP with influxdb for edgeai iot at scale 2022
Using FLiP with influxdb for edgeai iot at scale 2022Timothy Spann
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedInGuozhang Wang
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache KafkaJoe Stein
 
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsTimothy Spann
 
Building a Real-Time Data Pipeline with Spark, Kafka, and Python
Building a Real-Time Data Pipeline with Spark, Kafka, and PythonBuilding a Real-Time Data Pipeline with Spark, Kafka, and Python
Building a Real-Time Data Pipeline with Spark, Kafka, and PythonSingleStore
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache KafkaAmir Sedighi
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureTimothy Spann
 
Spark optimization
Spark optimizationSpark optimization
Spark optimizationAnkit Beohar
 

What's hot (20)

Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
fluentd -- the missing log collector
fluentd -- the missing log collectorfluentd -- the missing log collector
fluentd -- the missing log collector
 
Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)
 
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingSpark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream Processing
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streaming
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka Meetup
 
Cooperative Data Exploration with iPython Notebook
Cooperative Data Exploration with iPython NotebookCooperative Data Exploration with iPython Notebook
Cooperative Data Exploration with iPython Notebook
 
Stream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache Kafka
 
Using the flipn stack for edge ai (flink, nifi, pulsar)
Using the flipn stack for edge ai (flink, nifi, pulsar)Using the flipn stack for edge ai (flink, nifi, pulsar)
Using the flipn stack for edge ai (flink, nifi, pulsar)
 
Zoo keeper in the wild
Zoo keeper in the wildZoo keeper in the wild
Zoo keeper in the wild
 
Using FLiP with influxdb for edgeai iot at scale 2022
Using FLiP with influxdb for edgeai iot at scale 2022Using FLiP with influxdb for edgeai iot at scale 2022
Using FLiP with influxdb for edgeai iot at scale 2022
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
 
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
 
Building a Real-Time Data Pipeline with Spark, Kafka, and Python
Building a Real-Time Data Pipeline with Spark, Kafka, and PythonBuilding a Real-Time Data Pipeline with Spark, Kafka, and Python
Building a Real-Time Data Pipeline with Spark, Kafka, and Python
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
 
Spark optimization
Spark optimizationSpark optimization
Spark optimization
 

Viewers also liked

Developing Frameworks for Apache Mesos
Developing Frameworks  for Apache MesosDeveloping Frameworks  for Apache Mesos
Developing Frameworks for Apache MesosJoe Stein
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming by Ew...
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming by Ew...Building Realtime Data Pipelines with Kafka Connect and Spark Streaming by Ew...
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming by Ew...Spark Summit
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaJoe Stein
 
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...Michael Noll
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
 
jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011Joe Stein
 
Storing Time Series Metrics With Cassandra and Composite Columns
Storing Time Series Metrics With Cassandra and Composite ColumnsStoring Time Series Metrics With Cassandra and Composite Columns
Storing Time Series Metrics With Cassandra and Composite ColumnsJoe Stein
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Joe Stein
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaJoe Stein
 
Containerized Data Persistence on Mesos
Containerized Data Persistence on MesosContainerized Data Persistence on Mesos
Containerized Data Persistence on MesosJoe Stein
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0Joe Stein
 
Making Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache MesosMaking Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache MesosJoe Stein
 
Flink history, roadmap and vision
Flink history, roadmap and visionFlink history, roadmap and vision
Flink history, roadmap and visionStephan Ewen
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache MesosJoe Stein
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache KafkaJoe Stein
 
Apache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosApache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosJoe Stein
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1Joe Stein
 

Viewers also liked (20)

Developing Frameworks for Apache Mesos
Developing Frameworks  for Apache MesosDeveloping Frameworks  for Apache Mesos
Developing Frameworks for Apache Mesos
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming by Ew...
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming by Ew...Building Realtime Data Pipelines with Kafka Connect and Spark Streaming by Ew...
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming by Ew...
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
 
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011
 
Data Pipeline with Kafka
Data Pipeline with KafkaData Pipeline with Kafka
Data Pipeline with Kafka
 
Storing Time Series Metrics With Cassandra and Composite Columns
Storing Time Series Metrics With Cassandra and Composite ColumnsStoring Time Series Metrics With Cassandra and Composite Columns
Storing Time Series Metrics With Cassandra and Composite Columns
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Containerized Data Persistence on Mesos
Containerized Data Persistence on MesosContainerized Data Persistence on Mesos
Containerized Data Persistence on Mesos
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 
Making Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache MesosMaking Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache Mesos
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
 
Building data pipelines
Building data pipelinesBuilding data pipelines
Building data pipelines
 
Flink history, roadmap and vision
Flink history, roadmap and visionFlink history, roadmap and vision
Flink history, roadmap and vision
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
 
Apache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosApache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on Mesos
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
 

Similar to Developing Real-Time Data Pipelines with Apache Kafka

Cassandra and DataStax Enterprise on PCF
Cassandra and DataStax Enterprise on PCFCassandra and DataStax Enterprise on PCF
Cassandra and DataStax Enterprise on PCFVMware Tanzu
 
Building Highly Scalable Spring Applications using In-Memory Data Grids
Building Highly Scalable Spring Applications using In-Memory Data GridsBuilding Highly Scalable Spring Applications using In-Memory Data Grids
Building Highly Scalable Spring Applications using In-Memory Data GridsJohn Blum
 
Lattice: A Cloud-Native Platform for Your Spring Applications
Lattice: A Cloud-Native Platform for Your Spring ApplicationsLattice: A Cloud-Native Platform for Your Spring Applications
Lattice: A Cloud-Native Platform for Your Spring ApplicationsMatt Stine
 
Ratpack - SpringOne2GX 2015
Ratpack - SpringOne2GX 2015Ratpack - SpringOne2GX 2015
Ratpack - SpringOne2GX 2015Daniel Woods
 
SpringOnePlatform2017 recap
SpringOnePlatform2017 recapSpringOnePlatform2017 recap
SpringOnePlatform2017 recapminseok kim
 
12 Factor, or Cloud Native Apps – What EXACTLY Does that Mean for Spring Deve...
12 Factor, or Cloud Native Apps – What EXACTLY Does that Mean for Spring Deve...12 Factor, or Cloud Native Apps – What EXACTLY Does that Mean for Spring Deve...
12 Factor, or Cloud Native Apps – What EXACTLY Does that Mean for Spring Deve...cornelia davis
 
12 Factor, or Cloud Native Apps - What EXACTLY Does that Mean for Spring Deve...
12 Factor, or Cloud Native Apps - What EXACTLY Does that Mean for Spring Deve...12 Factor, or Cloud Native Apps - What EXACTLY Does that Mean for Spring Deve...
12 Factor, or Cloud Native Apps - What EXACTLY Does that Mean for Spring Deve...VMware Tanzu
 
Federated Queries with HAWQ - SQL on Hadoop and Beyond
Federated Queries with HAWQ - SQL on Hadoop and BeyondFederated Queries with HAWQ - SQL on Hadoop and Beyond
Federated Queries with HAWQ - SQL on Hadoop and BeyondChristian Tzolov
 
Cloud-Native Streaming and Event-Driven Microservices
Cloud-Native Streaming and Event-Driven MicroservicesCloud-Native Streaming and Event-Driven Microservices
Cloud-Native Streaming and Event-Driven MicroservicesVMware Tanzu
 
Introduction to Reactive Streams and Reactor 2.5
Introduction to Reactive Streams and Reactor 2.5Introduction to Reactive Streams and Reactor 2.5
Introduction to Reactive Streams and Reactor 2.5Stéphane Maldini
 
riffing on Knative - Scott Andrews
riffing on Knative - Scott Andrewsriffing on Knative - Scott Andrews
riffing on Knative - Scott AndrewsVMware Tanzu
 
Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...
Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...
Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...VMware Tanzu
 
Under the Hood of Reactive Data Access (2/2)
Under the Hood of Reactive Data Access (2/2)Under the Hood of Reactive Data Access (2/2)
Under the Hood of Reactive Data Access (2/2)VMware Tanzu
 
Migrating to Angular 5 for Spring Developers
Migrating to Angular 5 for Spring DevelopersMigrating to Angular 5 for Spring Developers
Migrating to Angular 5 for Spring DevelopersGunnar Hillert
 
State of Securing Restful APIs s12gx2015
State of Securing Restful APIs s12gx2015State of Securing Restful APIs s12gx2015
State of Securing Restful APIs s12gx2015robwinch
 
Migrating to Angular 4 for Spring Developers
Migrating to Angular 4 for Spring Developers Migrating to Angular 4 for Spring Developers
Migrating to Angular 4 for Spring Developers VMware Tanzu
 
Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...
Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...
Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...confluent
 
Connecting All Abstractions with Istio
Connecting All Abstractions with IstioConnecting All Abstractions with Istio
Connecting All Abstractions with IstioVMware Tanzu
 
Cloud Native Java with Spring Cloud Services
Cloud Native Java with Spring Cloud ServicesCloud Native Java with Spring Cloud Services
Cloud Native Java with Spring Cloud ServicesVMware Tanzu
 

Similar to Developing Real-Time Data Pipelines with Apache Kafka (20)

Cassandra and DataStax Enterprise on PCF
Cassandra and DataStax Enterprise on PCFCassandra and DataStax Enterprise on PCF
Cassandra and DataStax Enterprise on PCF
 
Building Highly Scalable Spring Applications using In-Memory Data Grids
Building Highly Scalable Spring Applications using In-Memory Data GridsBuilding Highly Scalable Spring Applications using In-Memory Data Grids
Building Highly Scalable Spring Applications using In-Memory Data Grids
 
Lattice: A Cloud-Native Platform for Your Spring Applications
Lattice: A Cloud-Native Platform for Your Spring ApplicationsLattice: A Cloud-Native Platform for Your Spring Applications
Lattice: A Cloud-Native Platform for Your Spring Applications
 
Ratpack - SpringOne2GX 2015
Ratpack - SpringOne2GX 2015Ratpack - SpringOne2GX 2015
Ratpack - SpringOne2GX 2015
 
SpringOnePlatform2017 recap
SpringOnePlatform2017 recapSpringOnePlatform2017 recap
SpringOnePlatform2017 recap
 
12 Factor, or Cloud Native Apps – What EXACTLY Does that Mean for Spring Deve...
12 Factor, or Cloud Native Apps – What EXACTLY Does that Mean for Spring Deve...12 Factor, or Cloud Native Apps – What EXACTLY Does that Mean for Spring Deve...
12 Factor, or Cloud Native Apps – What EXACTLY Does that Mean for Spring Deve...
 
12 Factor, or Cloud Native Apps - What EXACTLY Does that Mean for Spring Deve...
12 Factor, or Cloud Native Apps - What EXACTLY Does that Mean for Spring Deve...12 Factor, or Cloud Native Apps - What EXACTLY Does that Mean for Spring Deve...
12 Factor, or Cloud Native Apps - What EXACTLY Does that Mean for Spring Deve...
 
Federated Queries with HAWQ - SQL on Hadoop and Beyond
Federated Queries with HAWQ - SQL on Hadoop and BeyondFederated Queries with HAWQ - SQL on Hadoop and Beyond
Federated Queries with HAWQ - SQL on Hadoop and Beyond
 
Cloud-Native Streaming and Event-Driven Microservices
Cloud-Native Streaming and Event-Driven MicroservicesCloud-Native Streaming and Event-Driven Microservices
Cloud-Native Streaming and Event-Driven Microservices
 
Introduction to Reactive Streams and Reactor 2.5
Introduction to Reactive Streams and Reactor 2.5Introduction to Reactive Streams and Reactor 2.5
Introduction to Reactive Streams and Reactor 2.5
 
riffing on Knative - Scott Andrews
riffing on Knative - Scott Andrewsriffing on Knative - Scott Andrews
riffing on Knative - Scott Andrews
 
Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...
Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...
Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...
 
Under the Hood of Reactive Data Access (2/2)
Under the Hood of Reactive Data Access (2/2)Under the Hood of Reactive Data Access (2/2)
Under the Hood of Reactive Data Access (2/2)
 
Migrating to Angular 5 for Spring Developers
Migrating to Angular 5 for Spring DevelopersMigrating to Angular 5 for Spring Developers
Migrating to Angular 5 for Spring Developers
 
State of Securing Restful APIs s12gx2015
State of Securing Restful APIs s12gx2015State of Securing Restful APIs s12gx2015
State of Securing Restful APIs s12gx2015
 
Migrating to Angular 4 for Spring Developers
Migrating to Angular 4 for Spring Developers Migrating to Angular 4 for Spring Developers
Migrating to Angular 4 for Spring Developers
 
Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...
Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...
Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...
 
Connecting All Abstractions with Istio
Connecting All Abstractions with IstioConnecting All Abstractions with Istio
Connecting All Abstractions with Istio
 
Reactive Web Applications
Reactive Web ApplicationsReactive Web Applications
Reactive Web Applications
 
Cloud Native Java with Spring Cloud Services
Cloud Native Java with Spring Cloud ServicesCloud Native Java with Spring Cloud Services
Cloud Native Java with Spring Cloud Services
 

More from Joe Stein

Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogJoe Stein
 
Get started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache MesosGet started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache MesosJoe Stein
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraJoe Stein
 
Building and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosBuilding and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosJoe Stein
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache MesosJoe Stein
 
Apache Kafka
Apache KafkaApache Kafka
Apache KafkaJoe Stein
 
Hadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With PythonHadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With PythonJoe Stein
 

More from Joe Stein (7)

Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
Get started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache MesosGet started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache Mesos
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
 
Building and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosBuilding and Deploying Application to Apache Mesos
Building and Deploying Application to Apache Mesos
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache Mesos
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Hadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With PythonHadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With Python
 

Recently uploaded

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Recently uploaded (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Developing Real-Time Data Pipelines with Apache Kafka

  • 1. SPRINGONE2GX WASHINGTON, DC Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Developing Real-Time Data Pipelines with Apache Kafka Joe Stein @allthingshadoop
  • 2. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ CEO of Elodina http://www.elodina.net/ a big data as a service platform built on top open source software. The Elodina platform enables customers to analyze data streams and programmatically react to the results in real-time. We solve today’s data analytics needs by providing the tools and support necessary to utilize open source technologies. As users, contributors and committers, Elodina also provides support for frameworks that run on Mesos including Apache Kafka, Exhibitor (Zookeeper), Apache Storm, Apache Cassandra and a whole lot more! Apache Kafka Committer & PMC Member LinkedIn: http://linkedin.com/in/charmalloc Twitter : @allthingshadoop whoami 2
  • 3. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Contents • Introduction To Kafka • Overview • Topics, Partitions & Segments • Data Durability • Replication • Producers • Consumers • Performance • Integration • Quick Start • Operations 3 • Designs • Distributed RPC o Request o Process o Response • Storage & Analytics o Stream o Transform o Analyze o Store o Search
  • 4. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apache Kafka 4
  • 5. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apache Kafka Apache Kafka was first open sourced by LinkedIn in 2011 Papers ● Building a Replicated Logging System with Apache Kafka http://www.vldb.org/pvldb/vol8/p1654-wang.pdf ● Kafka: A Distributed Messaging System for Log Processing http://research.microsoft.com/en- us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf ● Building LinkedIn’s Real-time Activity Data Pipeline http://sites.computer.org/debull/A12june/pipeline.pdf ● The Log: What Every Software Engineer Should Know About Real-time Data's Unifying Abstraction http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying http://kafka.apache.org/ 5
  • 6. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ How Big Data Usually Starts 6
  • 7. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ More Big Data! 7
  • 8. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Ah! 8
  • 9. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ eesh 9
  • 10. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Kafka de-couples data pipelines 10
  • 11. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Distributed Replicated Log Read and write In real time As much as you want As fast as your network can go 11
  • 12. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Topics and Partitions 12
  • 13. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Log Segments 13
  • 14. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Distributed Replicated Log 14
  • 15. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Data Durability 15
  • 16. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Replication 16
  • 17. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Producers 17
  • 18. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Consumers 18
  • 19. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Consumer Failover 19
  • 20. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Producer Performance 20 https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
  • 21. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Consumer Performance http://kafka.apache.org/documentation.html#maximizingefficiency 21
  • 22. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Client Libraries Community Clients https://cwiki.apache.org/confluence/display/KAFKA/Clients ● Go (aka golang) Pure Go implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported. ● Python - Pure Python implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported. ● C - High performance C library with full protocol support ● Ruby - Pure Ruby, Consumer and Producer implementations included, GZIP and Snappy compression supported. Ruby 1.9.3 and up (CI runs MRI 2. ● Clojure - Clojure DSL for the Kafka API ● JavaScript (NodeJS) - NodeJS client in a pure JavaScript implementation Wire Protocol Developer's Guide https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol 22
  • 23. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring Integration Good blog about it https://spring.io/blog/2015/04/15/using-apache-kafka-for- integration-and-data-processing-pipelines-with-spring Kafka Integration Source https://github.com/spring-projects/spring-integration- kafka Spring XD samples https://github.com/spring-projects/spring-xd-samples/tree/master/kafka-source 23
  • 24. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Quick Start https://kafka.apache.org/documentation.html#quickstart Download the 0.8.2.2 release and un-tar it. > tar -xzf kafka_2.10-0.8.2.2.tgz > cd kafka_2.10-0.8.2.2 (use at least four terminal windows) > bin/zookeeper-server-start.sh config/zookeeper.properties > bin/kafka-server-start.sh config/server.properties > bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test > bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test This is a message This is another message > bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning This is a message This is another message 24
  • 25. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Operationalizing Kafka https://kafka.apache.org/documentation.html#basic_ops Basic Kafka Operations ● Adding and removing topics ● Modifying topics ● Graceful shutdown ● Balancing leadership ● Checking consumer position ● Mirroring data between clusters ● Expanding your cluster ● Decommissioning brokers ● Increasing replication factor 25
  • 26. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Running on Mesos 26
  • 27. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Static Partitioning 27
  • 28. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Scaling is manual (even if orchestrated) 28
  • 29. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Static failures require manual intervention 29
  • 30. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Application Elasticity 30
  • 31. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ An operating system for your data center 31
  • 32. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Everything goes on Mesos 32
  • 33. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Kafka on Mesos https://github.com/mesos/kafka ● smart broker.id assignment. ● preservation of broker placement (through constraints and/or new features). ● ability to-do configuration changes. ● rolling restarts (for things like configuration changes). ● scaling the cluster up and down with automatic, programmatic and manual options. ● smart partition assignment via constraints visa vi roles, resources and attributes. 33
  • 34. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Kafka on Mesos Scheduler ● Provides the operational automation for a Kafka Cluster. ● Manages the changes to the broker's configuration. ● Exposes a REST API for the CLI to use or any other client. ● Runs on Marathon for high availability. Executor ● The executor interacts with the kafka broker as an intermediary to the scheduler 34
  • 35. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ REST API & CLI ● scheduler - starts the scheduler. ● add - adds one more more brokers to the cluster. ● update - changes resources, constraints or broker properties one or more brokers. ● remove - take a broker out of the cluster. ● start - starts a broker up. ● stop - this can either a graceful shutdown or will force kill it (./kafka-mesos.sh help stop) ● rebalance - allows you to rebalance a cluster either by selecting the brokers or topics to rebalance. Manual assignment is still possible using the Apache Kafka project tools. Rebalance can also change the replication factor on a topic. ● help - ./kafka-mesos.sh help || ./kafka-mesos.sh help {command} 35
  • 36. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Launch 20 brokers in seconds 36
  • 37. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Kafka 0.9 KIP (Kafka Improvement Process) • https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals New Consumer • https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design Security • https://cwiki.apache.org/confluence/display/KAFKA/Security • https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=51809888 JIRA • https://issues.apache.org/jira/browse/KAFKA/fixforversion/12328745/?selectedTab=com.atlassian.jira .jira-projects-plugin:version-issues-panel 37
  • 38. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Distributed RPC 38
  • 39. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Reference Architecture 39
  • 40. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Questions? http://www.elodina.net 40