Introduction to Spark Streaming

SPARK STREAMING
Arindam Banerjee, Data Scientist, Ericsson

Spark Streaming - Introduction
• Spark Streaming is an extension of the core Spark
API.
• Scalable, high-throughput, fault-tolerant stream
processing of live data streams.
• Data can be ingested from many sources.
• Data can be processed using complex algorithms
expressed with high-level functions.
• Processed data can be pushed out to filesystems,
databases, and live dashboards.
• Spark’s machine learning and graph processing
algorithms can be applied on data streams.
Image Source: Official Spark Documentation

Overview
■ Spark Streaming receives live input data streams and divides the data into batches
■ Batches are processed by the Spark engine to generate the final stream of results in
batches.

DStreams
■ Spark Streaming provides a high-level abstraction called discretized
stream or Dstream.
■ DStream represents a continuous stream of data.
■ Can be created either from input data streams from sources such as Kafka, Flume, and
Kinesis, or by applying high-level operations on other DStreams.
■ Internally, a DStream is represented as a sequence RDD
■ Each RDD in a DStream contains data from a certain interval.

DStreams
■ Any operation applied on a DStream translates to operations on the underlying RDDs.

Built-in streaming sources
■ Basic sources: Sources directly available in the StreamingContextAPI. Examples: file
systems, and socket connections.
■ Advanced sources: Sources like Kafka, Flume, Kinesis, etc. are available through extra
utility classes.These require linking against extra dependencies.
■ Multiple input Dstreams can be created.This will create multiple receivers which will
simultaneously receive multiple data streams.
■ The number of cores allocated to the Spark Streaming application must be more than
the number of receivers.

Introduction to Spark Streaming

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to Spark Streaming

Similar to Introduction to Spark Streaming (20)

Recently uploaded

Recently uploaded (20)

Introduction to Spark Streaming