Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
BUILDING REALTIME DATA PIPELINES
WITH KAFKA CONNECT AND SPARK
STREAMING
Ewen Cheslack-Postava
Confluent
Stream Processing
Introducing
Kafka Connect
Large-scale streaming data import/export for Kafka
Separation of Concerns
Data Integration as a Service
THANK YOU
@ewencp
@confluentinc
http://confluent.io/download/
http://connectors.confluent.io
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava
Upcoming SlideShare
Loading in …5
×

of

Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava Slide 1 Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava Slide 2 Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava Slide 3 Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava Slide 4 Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava Slide 5 Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava Slide 6 Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava Slide 7 Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava Slide 8 Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava Slide 9 Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava Slide 10 Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava Slide 11 Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava Slide 12 Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava Slide 13 Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava Slide 14 Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava Slide 15 Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava Slide 16 Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava Slide 17 Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava Slide 18 Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava Slide 19 Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava Slide 20 Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava Slide 21 Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava Slide 22 Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava Slide 23 Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava Slide 24 Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava Slide 25 Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava Slide 26 Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava Slide 27
Upcoming SlideShare
What No One Tells You About Writing a Streaming App: Spark Summit East talk by Mark Grover and Ted Malaska
Next
Download to read offline and view in fullscreen.

5 Likes

Share

Download to read offline

Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava

Download to read offline

Spark Streaming makes it easy to build scalable, robust stream processing applications — but only once you’ve made your data accessible to the framework. If your data is already in one of Spark Streaming’s well-supported message queuing systems, this is easy. If not, an ad hoc solution to import data may work for a single application, but trying to scale that approach to complex data pipelines integrating dozens of data sources and sinks with multi-stage processing quickly breaks down. Spark Streaming solves the realtime data processing problem, but to build large scale data pipeline we need to combine it with another tool that addresses data integration challenges.

The Apache Kafka project recently introduced a new tool, Kafka Connect, to make data import/export to and from Kafka easier. This talk will first describe some data pipeline anti-patterns we have observed and motivate the need for a tool designed specifically to bridge the gap between other data systems and stream processing frameworks. We will introduce Kafka Connect, starting with basic usage, its data model, and how a variety of systems can map to this model. Next, we’ll explain how building a tool specifically designed around Kafka allows for stronger guarantees, better scalability, and simpler operationalization compared to other general purpose data copying tools. Finally, we’ll describe how combining Kafka Connect and Spark Streaming, and the resulting separation of concerns, allows you to manage the complexity of building, maintaining, and monitoring large scale data pipelines.

Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spark Summit East Talk by Ewen Cheslack Postava

  1. 1. BUILDING REALTIME DATA PIPELINES WITH KAFKA CONNECT AND SPARK STREAMING Ewen Cheslack-Postava Confluent
  2. 2. Stream Processing
  3. 3. Introducing Kafka Connect Large-scale streaming data import/export for Kafka
  4. 4. Separation of Concerns
  5. 5. Data Integration as a Service
  6. 6. THANK YOU @ewencp @confluentinc http://confluent.io/download/ http://connectors.confluent.io
  • ssuser0b4a67

    Feb. 20, 2019
  • bepcyc

    Apr. 4, 2018
  • mayank202

    Oct. 13, 2017
  • vkm1971

    Feb. 21, 2017
  • santu31389n

    Feb. 17, 2017

Spark Streaming makes it easy to build scalable, robust stream processing applications — but only once you’ve made your data accessible to the framework. If your data is already in one of Spark Streaming’s well-supported message queuing systems, this is easy. If not, an ad hoc solution to import data may work for a single application, but trying to scale that approach to complex data pipelines integrating dozens of data sources and sinks with multi-stage processing quickly breaks down. Spark Streaming solves the realtime data processing problem, but to build large scale data pipeline we need to combine it with another tool that addresses data integration challenges. The Apache Kafka project recently introduced a new tool, Kafka Connect, to make data import/export to and from Kafka easier. This talk will first describe some data pipeline anti-patterns we have observed and motivate the need for a tool designed specifically to bridge the gap between other data systems and stream processing frameworks. We will introduce Kafka Connect, starting with basic usage, its data model, and how a variety of systems can map to this model. Next, we’ll explain how building a tool specifically designed around Kafka allows for stronger guarantees, better scalability, and simpler operationalization compared to other general purpose data copying tools. Finally, we’ll describe how combining Kafka Connect and Spark Streaming, and the resulting separation of concerns, allows you to manage the complexity of building, maintaining, and monitoring large scale data pipelines.

Views

Total views

3,326

On Slideshare

0

From embeds

0

Number of embeds

1

Actions

Downloads

232

Shares

0

Comments

0

Likes

5

×