Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
BUILDING REALTIME DATA PIPELINES
WITH KAFKA CONNECT AND SPARK
STREAMING
Ewen Cheslack-Postava
Confluent
Stream Processing
Introducing
Kafka Connect
Large-scale streaming data import/export for Kafka
Separation of Concerns
Data Integration as a Service
THANK YOU
@ewencp
@confluentinc
http://confluent.io/download/
http://connectors.confluent.io
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What No One Tells You About Writing a Streaming App: Spark Summit East talk by Mark Grover and Ted Malaska
Next
Upcoming SlideShare
What No One Tells You About Writing a Streaming App: Spark Summit East talk by Mark Grover and Ted Malaska
Next
Download to read offline and view in fullscreen.

Share

Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava

Download to read offline

Spark Streaming makes it easy to build scalable, robust stream processing applications — but only once you’ve made your data accessible to the framework. If your data is already in one of Spark Streaming’s well-supported message queuing systems, this is easy. If not, an ad hoc solution to import data may work for a single application, but trying to scale that approach to complex data pipelines integrating dozens of data sources and sinks with multi-stage processing quickly breaks down. Spark Streaming solves the realtime data processing problem, but to build large scale data pipeline we need to combine it with another tool that addresses data integration challenges.

The Apache Kafka project recently introduced a new tool, Kafka Connect, to make data import/export to and from Kafka easier. This talk will first describe some data pipeline anti-patterns we have observed and motivate the need for a tool designed specifically to bridge the gap between other data systems and stream processing frameworks. We will introduce Kafka Connect, starting with basic usage, its data model, and how a variety of systems can map to this model. Next, we’ll explain how building a tool specifically designed around Kafka allows for stronger guarantees, better scalability, and simpler operationalization compared to other general purpose data copying tools. Finally, we’ll describe how combining Kafka Connect and Spark Streaming, and the resulting separation of concerns, allows you to manage the complexity of building, maintaining, and monitoring large scale data pipelines.

Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava

  1. 1. BUILDING REALTIME DATA PIPELINES WITH KAFKA CONNECT AND SPARK STREAMING Ewen Cheslack-Postava Confluent
  2. 2. Stream Processing
  3. 3. Introducing Kafka Connect Large-scale streaming data import/export for Kafka
  4. 4. Separation of Concerns
  5. 5. Data Integration as a Service
  6. 6. THANK YOU @ewencp @confluentinc http://confluent.io/download/ http://connectors.confluent.io
  • ssuser0b4a67

    Feb. 20, 2019
  • bepcyc

    Apr. 4, 2018
  • mayank202

    Oct. 13, 2017
  • vkm1971

    Feb. 21, 2017
  • santu31389n

    Feb. 17, 2017

Spark Streaming makes it easy to build scalable, robust stream processing applications — but only once you’ve made your data accessible to the framework. If your data is already in one of Spark Streaming’s well-supported message queuing systems, this is easy. If not, an ad hoc solution to import data may work for a single application, but trying to scale that approach to complex data pipelines integrating dozens of data sources and sinks with multi-stage processing quickly breaks down. Spark Streaming solves the realtime data processing problem, but to build large scale data pipeline we need to combine it with another tool that addresses data integration challenges. The Apache Kafka project recently introduced a new tool, Kafka Connect, to make data import/export to and from Kafka easier. This talk will first describe some data pipeline anti-patterns we have observed and motivate the need for a tool designed specifically to bridge the gap between other data systems and stream processing frameworks. We will introduce Kafka Connect, starting with basic usage, its data model, and how a variety of systems can map to this model. Next, we’ll explain how building a tool specifically designed around Kafka allows for stronger guarantees, better scalability, and simpler operationalization compared to other general purpose data copying tools. Finally, we’ll describe how combining Kafka Connect and Spark Streaming, and the resulting separation of concerns, allows you to manage the complexity of building, maintaining, and monitoring large scale data pipelines.

Views

Total views

3,298

On Slideshare

0

From embeds

0

Number of embeds

1

Actions

Downloads

232

Shares

0

Comments

0

Likes

5

×