SlideShare a Scribd company logo
1 of 32
Download to read offline
Data Streaming For Big Data
CMP652 Next Generation Database
Systems
Seval Çapraz
Content
•
1. What, Why, How of Streaming Big Data
•
2. Overview of Data Management Systems
– Vendors, Architectures, Ecosystem
•
3. The Most Popular Streaming Technologies
– Apache Storm, Apache Flink, Spark Streaming
•
Summary
•
Questions and Answers
•
References
1. What, Why, How of Streaming Big Data
What is streaming data?
•
Streaming data is an analytic computing platform that is focused on
speed.
•
By streaming, data can be continuously analyzed and transformed in
memory before it is stored on a disk.
•
It is a real time processing technique.
● All definitions are taken from reference [1]
Why Streaming Data?
•
Businesses are dealing with a lot of data that needs to be
processed and analyzed in real time.
•
Therefore, the physical environment that supports this level of
responsiveness is critical.
•
Streaming data environments typically require a clustered
hardware solution, and sometimes a massively parallel processing
approach will be required to handle the analysis.
•
Defining properties or dimensions of big data are volume, variety,
and velocity. Streaming technology can cover these three.
● All definitions are taken from reference [1]
BIG DATA
How Stream Processing?
•
Stream processing is a computer programming paradigm, equivalent to
dataflow programming, event stream processing, and reactive
programming.
•
It is the real-time processing of data continuously, concurrently, and in a
record-by-record fashion.
•
Processing streams of data works by processing “time windows” of data in
memory across a cluster of servers.
Data Processing
Stream Processing
When to use streaming?
•
Some key principles define when using streams is most appropriate:
When it is necessary to determine a retail buying opportunity
at the point of engagement, either via social media or via
permission-based messaging
Collecting information about the movement around a secure
site
To be able to react to an event that needs an immediate
response, such as a service outage or a change in a patient’s
medical condition
Real-time calculation of costs that are dependent on variables
such as usage and available resources
● All definitions are taken from reference [1]
Single-pass Analysis
•
One important factor about streaming data analysis is the fact
that it is a single-pass analysis.
•
In other words, the analyst cannot reanalyze the data after it is
streamed.
•
This is common in applications where you are looking for the
absence of data.
•
If several passes are required, the data will have to be put into
some sort of warehouse where additional analysis can be
performed.
● All definitions are taken from reference [1]
Streaming data vs. Hadoop
•
Streaming data is similar to the approach
when managing data at rest leveraging
Hadoop.
•
The primary difference is the issue of velocity.
•
In the Hadoop cluster, data is collected in
batch mode and then processed.
● All definitions are taken from reference [1]
Speed matters less
in Hadoop
than it does in
data streaming.
2. Overview of Database Management
Systems
Evolution of Data Management Solution
•
Relational Databases are not suited for Big Data
● All images are taken from reference [2]
Vendor Landscape
● All images are taken from reference [2]
An architecture of big data processing service
● All images are taken from reference [3]
Big Data Analytics Ecosystem
•
Recently, each architectural layer changed dramatically in terms of
the software stack
•
when services such as Yahoo!, Twitter, and LinkedIn released open
source solutions for dealing with big data.
•
The new architecture:
– Apache Kafka serves as a high-throughput distributed in-
memory messaging system in data ingestion layer,
– Apache Storm as a distributed and fault-tolerant real-time
computation in data analytic layer,
– Apache Cassandra as a NoSQL database in data storage layer.
● All definitions are taken from reference [3]
A simple instance of large-scale datastream-
processing service
● All images are taken from reference [3]
3. The Most Popular Streaming Technologies
Most Popular Technologies
•
Piping and Messaging
– Apache Kafka, Apache Flume, FluentD and ZeroMQ
•
Stream Processing
– Apache Storm, Apache Spark, Apache Flink, Esper, Apache Samza
•
Machine Learning
– Machine Learning: MLLib and Mahout
•
Persisting
– NoSQL DBs
– HDFS
Capability Analysis of Recent Open Source Stream-Processing Systems
[13] L. Neumeyer et al., “S4: Distributed Stream Computing Platform,” Proc. IEEE Int’l Conf. on Data Mining Workshops, 2010,
pp. 170–177.
● Table is taken from reference [3]
[12] M. Zaharia et al., “Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters,”
Proc. 4th Usenix Conf. Hot Topics in Cloud Computing, 2012.
Some of Streaming Computation Engines
•
Three open-source streaming engines:
– Apache Storm
– Apache Flink
– Apache Spark Streaming
● All definitions and images are taken from reference [4]
Apache Storm
● All definitions and images are taken from reference [4]
•
Apache Storm is a free and open source distributed realtime
computation system.
•
Apache Storm has the TopologyBuilder API to create a directed graph
(topology) through which streams of data flow.
•
“Spouts” are the entry point to the graph, and “bolts” perform the
processing.
•
Data flows through the system as individual tuples.
•
Graphs are not necessarily acyclic (although that is often the case)
● All definitions are taken from reference [6]
● All images are taken from reference [4]
•
Storm is fast: a benchmark clocked it at over a million tuples
processed per second per node.
•
A Storm topology consumes streams of data and processes those
streams in arbitrarily complex ways, repartitioning the streams
between each stage of the computation however needed.
Apache Flink
•
Apache Flink is an open-source stream processing framework for
distributed, high-performing, always-available, and accurate data
streaming applications.[7]
•
Apache Flink has the DataStream API to perform operations on
streams of data. (map, filter, reduce, join, etc.)
•
These operations are turned into a graph at job submission time by
Flink.
•
It works similarly to Storm’s model.
•
Also supports a Storm-compatible API.
● All definitions and images are taken from reference [4]
● All definitions and images are taken from reference [4]
•
Flink is designed to run on large-scale clusters with many thousands
of nodes, and in addition to a standalone cluster mode.
•
Flink’s core is a distributed streaming dataflow engine, meaning that
data is processed an event-at-a-time rather than as a series of
batches.
Apache Spark Streaming
•
Apache Spark is a fast and general engine for large-scale data
processing.
•
Apache Spark has the DStream API to perform operations on streams
of data. (map, filter, reduce, join, etc.) Based on Spark’s RDD
(Resilient Distributed Dataset) abstraction.
•
Similar to Flink’s API. However streaming accomplished through
micro-batches.
•
Spark streaming job consists of one small batch after another.
● All definitions and images are taken from reference [4]
● All definitions and images are taken from reference [4]
•
A Resilient Distributed Dataset (RDD), the basic abstraction in
Spark.
•
Using RDD(Resilient Distributed Dataset), Spark hides data
partitioning and can have parallel computational framework with
an API for four mainstream programming languages.
Storm 0.10
Storm 0.11
Storm 0.11
NO ACK
Flink
Spark
•
Benchmark is taken from reference [4].
99th
PercentileLatency
Throughput rate (events/sec)
Comparison of Streaming Technologies
Summary
•
Streaming data processing is beneficial in most scenarios where new,
dynamic data is generated on a continual basis. It applies to most of
the industry segments and big data use cases.[5]
•
Stream processing requires ingesting a sequence of data, and
incrementally updating metrics, reports, and summary statistics in
response to each arriving data record. It is better suited for real-time
monitoring and response functions.[5]
•
There are a few popular streaming data platforms such as –Apache
Storm, Apache Flink, Apache Spark Streaming.
•
Each of the streaming platforms have their advantages and
disadvantages. Active communities for big data processing projects
continue to innovate and benefit from each other’s advancements.
Questions and Answers…
Q&A
References
•
[1] Judith Hurwitz, Alan Nugent, Fern Halper, Marcia Kaufman, "How to Use Data Streaming
For Big Data", Dummies.com, 2017.
•
[2] Sanjai Marimadaiah (CA Technologies), “Big Data, Big Opportunity: A Primer for
Understanding The Big Data Frontier”, CA World 2015.
•
[3] Rajiv Ranjan, “Streaming Big Data Processing in Datacenter Clouds”, IEEE Cloud
Computing, vol. 1, no. 1, pp. 73-83, 2014.
•
[4] Reza Farivar, Kyle Knusbaum, “Performance Comparison of Streaming Big Data
Platforms”, DataWorks Summit/Hadoop Summit, 2016.
•
[5] “What is Streaming Data?”, https://aws.amazon.com/streaming-data/
•
[6] “Why use Storm?”, http://storm.apache.org/
•
[7] “Introduction to Flink”, https://flink.apache.org/

More Related Content

What's hot

Process synchronization in Operating Systems
Process synchronization in Operating SystemsProcess synchronization in Operating Systems
Process synchronization in Operating SystemsRitu Ranjan Shrivastwa
 
Data Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisData Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisVincenzo Gulisano
 
Distributed dbms architectures
Distributed dbms architecturesDistributed dbms architectures
Distributed dbms architecturesPooja Dixit
 
5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2Fabio Fumarola
 
Introduction to Data Stream Processing
Introduction to Data Stream ProcessingIntroduction to Data Stream Processing
Introduction to Data Stream ProcessingSafe Software
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and UsesSuvradeep Rudra
 
Operating system Dead lock
Operating system Dead lockOperating system Dead lock
Operating system Dead lockKaram Munir Butt
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1malathieswaran29
 
Transactions and Concurrency Control
Transactions and Concurrency ControlTransactions and Concurrency Control
Transactions and Concurrency ControlDilum Bandara
 
Streaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesStreaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesNatalino Busa
 
deadlock handling
deadlock handlingdeadlock handling
deadlock handlingSuraj Kumar
 

What's hot (20)

Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
Process synchronization in Operating Systems
Process synchronization in Operating SystemsProcess synchronization in Operating Systems
Process synchronization in Operating Systems
 
Tree pruning
 Tree pruning Tree pruning
Tree pruning
 
Data Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisData Streaming in Big Data Analysis
Data Streaming in Big Data Analysis
 
Deadlock Avoidance in Operating System
Deadlock Avoidance in Operating SystemDeadlock Avoidance in Operating System
Deadlock Avoidance in Operating System
 
Servlet life cycle
Servlet life cycleServlet life cycle
Servlet life cycle
 
Distributed dbms architectures
Distributed dbms architecturesDistributed dbms architectures
Distributed dbms architectures
 
Deadlock dbms
Deadlock dbmsDeadlock dbms
Deadlock dbms
 
5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2
 
Introduction to Data Stream Processing
Introduction to Data Stream ProcessingIntroduction to Data Stream Processing
Introduction to Data Stream Processing
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
Paging and segmentation
Paging and segmentationPaging and segmentation
Paging and segmentation
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
 
Operating system Dead lock
Operating system Dead lockOperating system Dead lock
Operating system Dead lock
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
 
Transactions and Concurrency Control
Transactions and Concurrency ControlTransactions and Concurrency Control
Transactions and Concurrency Control
 
Streaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesStreaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologies
 
deadlock handling
deadlock handlingdeadlock handling
deadlock handling
 
MapReduce in Cloud Computing
MapReduce in Cloud ComputingMapReduce in Cloud Computing
MapReduce in Cloud Computing
 

Similar to Data Streaming For Big Data

Architecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationArchitecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationGeorge Long
 
Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application  Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application Apache Apex
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko GlobalLogic Ukraine
 
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...InfluxData
 
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxData
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesDavid Martínez Rego
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptxbetalab
 
I Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache KafkaI Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache KafkaJay Kreps
 
Where Should You Deliver Database Services From?
Where Should You Deliver Database Services From?Where Should You Deliver Database Services From?
Where Should You Deliver Database Services From?EDB
 
Hadoop Ecosystem and Low Latency Streaming Architecture
Hadoop Ecosystem and Low Latency Streaming ArchitectureHadoop Ecosystem and Low Latency Streaming Architecture
Hadoop Ecosystem and Low Latency Streaming ArchitectureInSemble
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureLuan Moreno Medeiros Maciel
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...AboutYouGmbH
 
PEARC 17: Spark On the ARC
PEARC 17: Spark On the ARCPEARC 17: Spark On the ARC
PEARC 17: Spark On the ARCHimanshu Bedi
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache ApexApache Apex
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...Lightbend
 
Lightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidLightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidDataWorks Summit
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022HostedbyConfluent
 

Similar to Data Streaming For Big Data (20)

Architecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationArchitecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & Manipulation
 
Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application  Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
 
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
 
Data streaming fundamentals
Data streaming fundamentalsData streaming fundamentals
Data streaming fundamentals
 
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptx
 
I Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache KafkaI Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache Kafka
 
Where Should You Deliver Database Services From?
Where Should You Deliver Database Services From?Where Should You Deliver Database Services From?
Where Should You Deliver Database Services From?
 
Hadoop Ecosystem and Low Latency Streaming Architecture
Hadoop Ecosystem and Low Latency Streaming ArchitectureHadoop Ecosystem and Low Latency Streaming Architecture
Hadoop Ecosystem and Low Latency Streaming Architecture
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
 
PEARC 17: Spark On the ARC
PEARC 17: Spark On the ARCPEARC 17: Spark On the ARC
PEARC 17: Spark On the ARC
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
 
Lightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidLightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and Druid
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
 
Apache Spark Components
Apache Spark ComponentsApache Spark Components
Apache Spark Components
 

More from Seval Çapraz

A Quick Start To Blockchain by Seval Capraz
A Quick Start To Blockchain by Seval CaprazA Quick Start To Blockchain by Seval Capraz
A Quick Start To Blockchain by Seval CaprazSeval Çapraz
 
Yapay Sinir Ağları ile çiftler ticareti finansal tahmin pepsi cocacola örneği
Yapay Sinir Ağları ile çiftler ticareti finansal tahmin pepsi cocacola örneğiYapay Sinir Ağları ile çiftler ticareti finansal tahmin pepsi cocacola örneği
Yapay Sinir Ağları ile çiftler ticareti finansal tahmin pepsi cocacola örneğiSeval Çapraz
 
Assembly Dili İle Binary Search Gerçekleştirimi
Assembly Dili İle Binary Search GerçekleştirimiAssembly Dili İle Binary Search Gerçekleştirimi
Assembly Dili İle Binary Search GerçekleştirimiSeval Çapraz
 
Zimbra zooms ahead with OneView
Zimbra zooms ahead with OneViewZimbra zooms ahead with OneView
Zimbra zooms ahead with OneViewSeval Çapraz
 
Software Project Management Plan
Software Project Management PlanSoftware Project Management Plan
Software Project Management PlanSeval Çapraz
 
Distributed Computing Answers
Distributed Computing AnswersDistributed Computing Answers
Distributed Computing AnswersSeval Çapraz
 
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Seval Çapraz
 
Statistical Data Analysis on Diabetes 130-US hospitals for years 1999-2008 Da...
Statistical Data Analysis on Diabetes 130-US hospitals for years 1999-2008 Da...Statistical Data Analysis on Diabetes 130-US hospitals for years 1999-2008 Da...
Statistical Data Analysis on Diabetes 130-US hospitals for years 1999-2008 Da...Seval Çapraz
 
VARIABILITY MANAGEMENT IN SOFTWARE PRODUCT LINES
VARIABILITY MANAGEMENT IN SOFTWARE PRODUCT LINESVARIABILITY MANAGEMENT IN SOFTWARE PRODUCT LINES
VARIABILITY MANAGEMENT IN SOFTWARE PRODUCT LINESSeval Çapraz
 
A Content Boosted Hybrid Recommendation System
A Content Boosted Hybrid Recommendation SystemA Content Boosted Hybrid Recommendation System
A Content Boosted Hybrid Recommendation SystemSeval Çapraz
 
Importance of software quality assurance to prevent and reduce software failu...
Importance of software quality assurance to prevent and reduce software failu...Importance of software quality assurance to prevent and reduce software failu...
Importance of software quality assurance to prevent and reduce software failu...Seval Çapraz
 
A Document Management System in Defense Industry Case Study
A Document Management System in Defense Industry Case StudyA Document Management System in Defense Industry Case Study
A Document Management System in Defense Industry Case StudySeval Çapraz
 
Comparison of Parallel Algorithms For An Image Processing Problem on Cuda
Comparison of Parallel Algorithms For An Image Processing Problem on CudaComparison of Parallel Algorithms For An Image Processing Problem on Cuda
Comparison of Parallel Algorithms For An Image Processing Problem on CudaSeval Çapraz
 
GPU-Accelerated Route Planning of Multi-UAV Systems Using Simulated Annealing...
GPU-Accelerated Route Planning of Multi-UAV Systems Using Simulated Annealing...GPU-Accelerated Route Planning of Multi-UAV Systems Using Simulated Annealing...
GPU-Accelerated Route Planning of Multi-UAV Systems Using Simulated Annealing...Seval Çapraz
 
Semantic Filtering (An Image Processing Method)
Semantic Filtering (An Image Processing Method)Semantic Filtering (An Image Processing Method)
Semantic Filtering (An Image Processing Method)Seval Çapraz
 
Optical Flow with Semantic Segmentation and Localized Layers
Optical Flow with Semantic Segmentation and Localized LayersOptical Flow with Semantic Segmentation and Localized Layers
Optical Flow with Semantic Segmentation and Localized LayersSeval Çapraz
 
Spam Tanıma İçin Geliştirilmiş Güncel Yöntemlere Genel Bakış | Seval Çapraz
Spam Tanıma İçin Geliştirilmiş Güncel Yöntemlere Genel Bakış | Seval ÇaprazSpam Tanıma İçin Geliştirilmiş Güncel Yöntemlere Genel Bakış | Seval Çapraz
Spam Tanıma İçin Geliştirilmiş Güncel Yöntemlere Genel Bakış | Seval ÇaprazSeval Çapraz
 
What is Datamining? Which algorithms can be used for Datamining?
What is Datamining? Which algorithms can be used for Datamining?What is Datamining? Which algorithms can be used for Datamining?
What is Datamining? Which algorithms can be used for Datamining?Seval Çapraz
 
Bir Android Uygulamasında Bulunması Gereken Özellikler | Seval ZX | Android D...
Bir Android Uygulamasında Bulunması Gereken Özellikler | Seval ZX | Android D...Bir Android Uygulamasında Bulunması Gereken Özellikler | Seval ZX | Android D...
Bir Android Uygulamasında Bulunması Gereken Özellikler | Seval ZX | Android D...Seval Çapraz
 

More from Seval Çapraz (20)

A Quick Start To Blockchain by Seval Capraz
A Quick Start To Blockchain by Seval CaprazA Quick Start To Blockchain by Seval Capraz
A Quick Start To Blockchain by Seval Capraz
 
Yapay Sinir Ağları ile çiftler ticareti finansal tahmin pepsi cocacola örneği
Yapay Sinir Ağları ile çiftler ticareti finansal tahmin pepsi cocacola örneğiYapay Sinir Ağları ile çiftler ticareti finansal tahmin pepsi cocacola örneği
Yapay Sinir Ağları ile çiftler ticareti finansal tahmin pepsi cocacola örneği
 
Etu Location
Etu LocationEtu Location
Etu Location
 
Assembly Dili İle Binary Search Gerçekleştirimi
Assembly Dili İle Binary Search GerçekleştirimiAssembly Dili İle Binary Search Gerçekleştirimi
Assembly Dili İle Binary Search Gerçekleştirimi
 
Zimbra zooms ahead with OneView
Zimbra zooms ahead with OneViewZimbra zooms ahead with OneView
Zimbra zooms ahead with OneView
 
Software Project Management Plan
Software Project Management PlanSoftware Project Management Plan
Software Project Management Plan
 
Distributed Computing Answers
Distributed Computing AnswersDistributed Computing Answers
Distributed Computing Answers
 
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
 
Statistical Data Analysis on Diabetes 130-US hospitals for years 1999-2008 Da...
Statistical Data Analysis on Diabetes 130-US hospitals for years 1999-2008 Da...Statistical Data Analysis on Diabetes 130-US hospitals for years 1999-2008 Da...
Statistical Data Analysis on Diabetes 130-US hospitals for years 1999-2008 Da...
 
VARIABILITY MANAGEMENT IN SOFTWARE PRODUCT LINES
VARIABILITY MANAGEMENT IN SOFTWARE PRODUCT LINESVARIABILITY MANAGEMENT IN SOFTWARE PRODUCT LINES
VARIABILITY MANAGEMENT IN SOFTWARE PRODUCT LINES
 
A Content Boosted Hybrid Recommendation System
A Content Boosted Hybrid Recommendation SystemA Content Boosted Hybrid Recommendation System
A Content Boosted Hybrid Recommendation System
 
Importance of software quality assurance to prevent and reduce software failu...
Importance of software quality assurance to prevent and reduce software failu...Importance of software quality assurance to prevent and reduce software failu...
Importance of software quality assurance to prevent and reduce software failu...
 
A Document Management System in Defense Industry Case Study
A Document Management System in Defense Industry Case StudyA Document Management System in Defense Industry Case Study
A Document Management System in Defense Industry Case Study
 
Comparison of Parallel Algorithms For An Image Processing Problem on Cuda
Comparison of Parallel Algorithms For An Image Processing Problem on CudaComparison of Parallel Algorithms For An Image Processing Problem on Cuda
Comparison of Parallel Algorithms For An Image Processing Problem on Cuda
 
GPU-Accelerated Route Planning of Multi-UAV Systems Using Simulated Annealing...
GPU-Accelerated Route Planning of Multi-UAV Systems Using Simulated Annealing...GPU-Accelerated Route Planning of Multi-UAV Systems Using Simulated Annealing...
GPU-Accelerated Route Planning of Multi-UAV Systems Using Simulated Annealing...
 
Semantic Filtering (An Image Processing Method)
Semantic Filtering (An Image Processing Method)Semantic Filtering (An Image Processing Method)
Semantic Filtering (An Image Processing Method)
 
Optical Flow with Semantic Segmentation and Localized Layers
Optical Flow with Semantic Segmentation and Localized LayersOptical Flow with Semantic Segmentation and Localized Layers
Optical Flow with Semantic Segmentation and Localized Layers
 
Spam Tanıma İçin Geliştirilmiş Güncel Yöntemlere Genel Bakış | Seval Çapraz
Spam Tanıma İçin Geliştirilmiş Güncel Yöntemlere Genel Bakış | Seval ÇaprazSpam Tanıma İçin Geliştirilmiş Güncel Yöntemlere Genel Bakış | Seval Çapraz
Spam Tanıma İçin Geliştirilmiş Güncel Yöntemlere Genel Bakış | Seval Çapraz
 
What is Datamining? Which algorithms can be used for Datamining?
What is Datamining? Which algorithms can be used for Datamining?What is Datamining? Which algorithms can be used for Datamining?
What is Datamining? Which algorithms can be used for Datamining?
 
Bir Android Uygulamasında Bulunması Gereken Özellikler | Seval ZX | Android D...
Bir Android Uygulamasında Bulunması Gereken Özellikler | Seval ZX | Android D...Bir Android Uygulamasında Bulunması Gereken Özellikler | Seval ZX | Android D...
Bir Android Uygulamasında Bulunması Gereken Özellikler | Seval ZX | Android D...
 

Recently uploaded

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 

Recently uploaded (20)

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 

Data Streaming For Big Data

  • 1. Data Streaming For Big Data CMP652 Next Generation Database Systems Seval Çapraz
  • 2. Content • 1. What, Why, How of Streaming Big Data • 2. Overview of Data Management Systems – Vendors, Architectures, Ecosystem • 3. The Most Popular Streaming Technologies – Apache Storm, Apache Flink, Spark Streaming • Summary • Questions and Answers • References
  • 3. 1. What, Why, How of Streaming Big Data
  • 4. What is streaming data? • Streaming data is an analytic computing platform that is focused on speed. • By streaming, data can be continuously analyzed and transformed in memory before it is stored on a disk. • It is a real time processing technique. ● All definitions are taken from reference [1]
  • 5. Why Streaming Data? • Businesses are dealing with a lot of data that needs to be processed and analyzed in real time. • Therefore, the physical environment that supports this level of responsiveness is critical. • Streaming data environments typically require a clustered hardware solution, and sometimes a massively parallel processing approach will be required to handle the analysis. • Defining properties or dimensions of big data are volume, variety, and velocity. Streaming technology can cover these three. ● All definitions are taken from reference [1] BIG DATA
  • 6. How Stream Processing? • Stream processing is a computer programming paradigm, equivalent to dataflow programming, event stream processing, and reactive programming. • It is the real-time processing of data continuously, concurrently, and in a record-by-record fashion. • Processing streams of data works by processing “time windows” of data in memory across a cluster of servers.
  • 9. When to use streaming? • Some key principles define when using streams is most appropriate: When it is necessary to determine a retail buying opportunity at the point of engagement, either via social media or via permission-based messaging Collecting information about the movement around a secure site To be able to react to an event that needs an immediate response, such as a service outage or a change in a patient’s medical condition Real-time calculation of costs that are dependent on variables such as usage and available resources ● All definitions are taken from reference [1]
  • 10. Single-pass Analysis • One important factor about streaming data analysis is the fact that it is a single-pass analysis. • In other words, the analyst cannot reanalyze the data after it is streamed. • This is common in applications where you are looking for the absence of data. • If several passes are required, the data will have to be put into some sort of warehouse where additional analysis can be performed. ● All definitions are taken from reference [1]
  • 11. Streaming data vs. Hadoop • Streaming data is similar to the approach when managing data at rest leveraging Hadoop. • The primary difference is the issue of velocity. • In the Hadoop cluster, data is collected in batch mode and then processed. ● All definitions are taken from reference [1] Speed matters less in Hadoop than it does in data streaming.
  • 12. 2. Overview of Database Management Systems
  • 13. Evolution of Data Management Solution • Relational Databases are not suited for Big Data ● All images are taken from reference [2]
  • 14. Vendor Landscape ● All images are taken from reference [2]
  • 15. An architecture of big data processing service ● All images are taken from reference [3]
  • 16. Big Data Analytics Ecosystem • Recently, each architectural layer changed dramatically in terms of the software stack • when services such as Yahoo!, Twitter, and LinkedIn released open source solutions for dealing with big data. • The new architecture: – Apache Kafka serves as a high-throughput distributed in- memory messaging system in data ingestion layer, – Apache Storm as a distributed and fault-tolerant real-time computation in data analytic layer, – Apache Cassandra as a NoSQL database in data storage layer. ● All definitions are taken from reference [3]
  • 17. A simple instance of large-scale datastream- processing service ● All images are taken from reference [3]
  • 18. 3. The Most Popular Streaming Technologies
  • 19. Most Popular Technologies • Piping and Messaging – Apache Kafka, Apache Flume, FluentD and ZeroMQ • Stream Processing – Apache Storm, Apache Spark, Apache Flink, Esper, Apache Samza • Machine Learning – Machine Learning: MLLib and Mahout • Persisting – NoSQL DBs – HDFS
  • 20. Capability Analysis of Recent Open Source Stream-Processing Systems [13] L. Neumeyer et al., “S4: Distributed Stream Computing Platform,” Proc. IEEE Int’l Conf. on Data Mining Workshops, 2010, pp. 170–177. ● Table is taken from reference [3]
  • 21. [12] M. Zaharia et al., “Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters,” Proc. 4th Usenix Conf. Hot Topics in Cloud Computing, 2012.
  • 22. Some of Streaming Computation Engines • Three open-source streaming engines: – Apache Storm – Apache Flink – Apache Spark Streaming ● All definitions and images are taken from reference [4]
  • 23. Apache Storm ● All definitions and images are taken from reference [4] • Apache Storm is a free and open source distributed realtime computation system. • Apache Storm has the TopologyBuilder API to create a directed graph (topology) through which streams of data flow. • “Spouts” are the entry point to the graph, and “bolts” perform the processing. • Data flows through the system as individual tuples. • Graphs are not necessarily acyclic (although that is often the case)
  • 24. ● All definitions are taken from reference [6] ● All images are taken from reference [4] • Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. • A Storm topology consumes streams of data and processes those streams in arbitrarily complex ways, repartitioning the streams between each stage of the computation however needed.
  • 25. Apache Flink • Apache Flink is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications.[7] • Apache Flink has the DataStream API to perform operations on streams of data. (map, filter, reduce, join, etc.) • These operations are turned into a graph at job submission time by Flink. • It works similarly to Storm’s model. • Also supports a Storm-compatible API. ● All definitions and images are taken from reference [4]
  • 26. ● All definitions and images are taken from reference [4] • Flink is designed to run on large-scale clusters with many thousands of nodes, and in addition to a standalone cluster mode. • Flink’s core is a distributed streaming dataflow engine, meaning that data is processed an event-at-a-time rather than as a series of batches.
  • 27. Apache Spark Streaming • Apache Spark is a fast and general engine for large-scale data processing. • Apache Spark has the DStream API to perform operations on streams of data. (map, filter, reduce, join, etc.) Based on Spark’s RDD (Resilient Distributed Dataset) abstraction. • Similar to Flink’s API. However streaming accomplished through micro-batches. • Spark streaming job consists of one small batch after another. ● All definitions and images are taken from reference [4]
  • 28. ● All definitions and images are taken from reference [4] • A Resilient Distributed Dataset (RDD), the basic abstraction in Spark. • Using RDD(Resilient Distributed Dataset), Spark hides data partitioning and can have parallel computational framework with an API for four mainstream programming languages.
  • 29. Storm 0.10 Storm 0.11 Storm 0.11 NO ACK Flink Spark • Benchmark is taken from reference [4]. 99th PercentileLatency Throughput rate (events/sec) Comparison of Streaming Technologies
  • 30. Summary • Streaming data processing is beneficial in most scenarios where new, dynamic data is generated on a continual basis. It applies to most of the industry segments and big data use cases.[5] • Stream processing requires ingesting a sequence of data, and incrementally updating metrics, reports, and summary statistics in response to each arriving data record. It is better suited for real-time monitoring and response functions.[5] • There are a few popular streaming data platforms such as –Apache Storm, Apache Flink, Apache Spark Streaming. • Each of the streaming platforms have their advantages and disadvantages. Active communities for big data processing projects continue to innovate and benefit from each other’s advancements.
  • 32. References • [1] Judith Hurwitz, Alan Nugent, Fern Halper, Marcia Kaufman, "How to Use Data Streaming For Big Data", Dummies.com, 2017. • [2] Sanjai Marimadaiah (CA Technologies), “Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier”, CA World 2015. • [3] Rajiv Ranjan, “Streaming Big Data Processing in Datacenter Clouds”, IEEE Cloud Computing, vol. 1, no. 1, pp. 73-83, 2014. • [4] Reza Farivar, Kyle Knusbaum, “Performance Comparison of Streaming Big Data Platforms”, DataWorks Summit/Hadoop Summit, 2016. • [5] “What is Streaming Data?”, https://aws.amazon.com/streaming-data/ • [6] “Why use Storm?”, http://storm.apache.org/ • [7] “Introduction to Flink”, https://flink.apache.org/