SlideShare a Scribd company logo
Module-1
Introduction to Big Data and STORM
www.edureka.in/apache-storm
LIVE On-line Class
Class Recording in LMS
24/7 Post Class Support
Module Wise Quiz and Assignment
Project Work on Large Data Set
Verifiable Certificate
How it Works?
Slide 2 www.edureka.in/apache-storm
Course Topics
Slide 3 www.edureka.in/apache-storm
 Module 1
» Introduction to Big Data and Storm
 Module 2
» Storm Technology Stack and Groupings
 Module 3
» Spouts and Bolts
 Module 4
» Trident Topologies
 Module 5
» Real Life Storm Project -1
 Module 6
» Real Life Storm Project -2
Objectives
Slide 4 www.edureka.in/apache-storm
At the end of this module, you will be able to:
Recall Big Data and Hadoop
Understand Batch and Real-time Analytics of Big Data
Investigate Shortcoming of Hadoop
Understand Lambda Architecture
Develop a basic knowledge of Apache Storm and its components
Explain the Use Cases and Key Differentiators of Storm
Big Data
Slide 5 www.edureka.in/apache-storm
Storm is a open source computing system used for processing Real-time Big Data Analytics.
Lets understand Big Data first to learn STORM.
 Lots of Data - Terabytes or Petabytes
 Big data is the term for a collection of data sets so
large and complex that it becomes difficult to process
using on-hand database management tools or
traditional data processing applications.
 The challenges include capture, curation, storage,
search, sharing, transfer, analysis, and visualization.
What is Big Data?
Slide 6 www.edureka.in/apache-storm
 Systems / Enterprises generate huge amount of data from Terabytes and even Petabytes of information.
Stock market generates about one terabyte of new trade data per day to
perform stock trading analytics to determine trends for optimal trades.
What is Big Data?
Slide 7 www.edureka.in/apache-storm
 2,500 exabytes of new information in 2012 with Internet as primary driver.
 Digital universe grew by 62% last year to 800K petabytes and will grow to 1.2 “zettabytes” this year.
Slide 8 www.edureka.in/apache-storm
Un-structured Data is Exploding
IBM’s Definition – Big Data Characteristics
http://www-01.ibm.com/software/data/bigdata/
IBM’s Definition
Web
logs
Images
Videos
Sensor
Data
Audios
VOLUME VELOCITY VARIETY
Slide 9 www.edureka.in/apache-storm
Annie’s Introduction
Hello There!!
My name is Annie.
I love quizzes and
puzzles and I am here to
make you guys think and
answer my questions.
Slide 10 www.edureka.in/apache-storm
Annie’s Question
Map the following to correspolnodinTghdeatraet!y!pe:
Slide 11 www.edureka.in/apache-storm
My name is Annie.
I lo quizzes and
Data from EpnutezrpzrliseessyastnemdsI(EaRmP, CRhMereetc.)to
make you guys think and
answer my questions.
- XML files
- Word docs, PDF files, Text files
-
-
E-Mail body
Annie’s Answer
XML files -> Semi-structureldodTathaere!!
Slide 12 www.edureka.in/apache-storm
Word docs, PDF filesM, Tyextnfailems -e> UisnsAtrnunctiuer.ed DataE-Mail body -> Unstructured Data
Data from EnterpriseIsylostems q(EuRiPz, zCReMs eatcn.)d-> Structured Data
puzzles and I am here to
make you guys think and
answer my questions.
Hadoop and its primary programming model, Map-Reduce, are great for batch-oriented processing of huge amount
of data.
With growing data, Hadoop enables you to horizontally scale your cluster by adding commodity nodes and thus keep
up with query workloads.
is primary programming model
great for batch-oriented processing of huge amount of data
Big Data Batch Analytics
Slide 13 www.edureka.in/apache-storm
What is Hadoop?
Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of
commodity computers using a simple programming model.
It is an Open-source Data Management with scale-out storage and distributed processing.
Slide 14 www.edureka.in/apache-storm
Hadoop Eco-System
Apache Oozie (Workflow)
HDFS (Hadoop Distributed File System)
HIVE
DW System
Pig Latin
Data Analysis Other
YARN
Frameworks
(MPI,GIRAPH)MapReduce Framework
HBase
YARN
Cluster Resource Management
Slide 15 www.edureka.in/apache-storm
This evolution has forced the addition of support for
Higher Level Languages (Pig & Hive) New Real-time Storage Engines (HBase)
Big Data Batch Analytics
Extensions for Streaming Data (Hadoop Streaming)
Slide 16 www.edureka.in/apache-storm
Due to batch processing, Hadoop should be deployed in situations such as
Index Building
Pattern Recognitions
Creating Recommendation
Engine
Sentiment Analysis
Situations
generate
huge amount of data
stored
queried
Hadoop for Batch Analytics
Slide 17 www.edureka.in/apache-storm
Real-time Big Data Analytics
Social Networking:
» Pick your own Big Data database (RDBMS or NoSQL)
» Measure the immediate impact to your site traffic from
social media, whether a new blog post, a tweet, a “Like”,
or even a comment.
» Knowing this information translates to better conversion
and more effective online campaigns.
Slide 18 www.edureka.in/apache-storm
Real-time Big Data Analytics
SaaS:
» Measuring user behaviour and acting upon it is crucial
for improving customer satisfaction and conversion rates
– which represent immediate increases in revenue.
Slide 19 www.edureka.in/apache-storm
Real-time Big Data Analytics
Financial Services:
» Determining in real time whether your portfolio is losing
money, or if there is fraud in your system means that you
can prevent disasters as they occur, not after the damage
is done.
» Correlating multiple sources from the market in real-time
results in a more accurate view of the market and enables
more accurate actions to maximize your profit.
Slide 20 www.edureka.in/apache-storm
Real Time Big Data Analytics - Options
Apache StormAmazon Kinesis
Slide 21 www.edureka.in/apache-storm
Problem Statement:
To find the total number of page views of Edureka’s blog over a
range of time.
Google Analytics can provide you this information.
Example: For a particular day, the data can be:
Need for Real-time Analytics
Slide 22 www.edureka.in/apache-storm
petabyte – scale
All Data
Slide 23 www.edureka.in/apache-storm
Need for Real-time Analytics
Challenge:
Querying huge amount of Historical Data is slow
Precomputed
View
All Data Query
Slide 24 www.edureka.in/apache-storm
Need for Real-time Analytics
Solution:
Precompute historical data
Need for Real-time Analytics
Google Analytics might have to keep the historical data for each hour as precompiled view
Page view
Page view
Page view
Page view
Page view
All Data
Query
Slide 25 www.edureka.in/apache-storm
URL Hr of the
day
No. of
pageviews
edureka.in/blog/aboutapachestorm 1 250
edureka.in/blog/aboutapachestorm 2 300
edureka.in/blog/aboutapachestorm 3 455
edureka.in/blog/aboutapachestorm 4 460
edureka.in/blog/aboutapachestorm 5 320
edureka.in/blog/aboutapachestorm 6 111
edureka.in/blog/aboutapachestorm 7 129
Precomputed View
Need for Real-time Analytics
Precomputed
View
All Data Query
Slide 26 www.edureka.in/apache-storm
using Hadoop
But, what about the
data generated after
last precompiled view?
Slide 27 www.edureka.in/apache-storm
Need for Real-time Analytics
Compensating for last few hours of data
Need for Real-time Analytics
spout
bolt
bolt
bolt Real-time
View
Storm
Real-time
Data
Stored
Or
Slide 28 www.edureka.in/apache-storm
Or
All Data
Precomputed
Batch View
Precomputed
Real-time View
Query
New Data Stream Storm
Slide 29 www.edureka.in/apache-storm
Hadoop
Need for Real-time Analytics
Lambda Architecture
All data entering the system is dispatched to both the batch layer and the speed layer for processing.
New Data
Speed Layer
Slide 30 www.edureka.in/apache-storm
Batch Layer
1
Serving Layer
Lambda Architecture
Batch View
Batch View
Master
Dataset
The batch layer has two functions:
» managing the master dataset (an immutable, append-only set of raw data), and
» to pre-compute the batch views. The serving layer indexes the batch views so that they can be queried in
low-latency, ad-hoc way.
Batch Layer Serving Layer
New Data
Speed Layer
1
2
3
Slide 31 www.edureka.in/apache-storm
Lambda Architecture
The speed layer compensates for the high latency of updates to the serving layer and deals with recent data only.
Batch View
Batch View
Real-time
View
Master
Dataset
New Data
Speed Layer
Real-time
View
Batch Layer Serving Layer
1
2
3
Slide 32 www.edureka.in/apache-storm
4
Lambda Architecture
Any incoming query can be answered by merging results from batch views and real-time views.
Batch View
Batch View
Real-time
View
Master
Dataset
New Data
Query
Speed Layer
Query
Real-time
View
Batch Layer Serving Layer
1
2
3
Slide 33 www.edureka.in/apache-storm
4
5
Storm is a distributed, reliable, fault-tolerant system for processing streams of data.
Fault-tolerant
STORM
processing
Streams of Data
What is Storm?
Slide 34 www.edureka.in/apache-storm
The work is delegated to different types of components that are each responsible for a simple specific processing task.
The input stream of a Storm cluster is handled by a component called a spout.
The spout passes the data to a component called a bolt, which transforms it in some way.
A bolt either persists the data in some sort of storage, or passes it to some other bolt.
transforms data
bolt
bolt
spout
spout
bolt
bolt
passes data
passes data
data storage
Input Data
Source
What is Storm?
Slide 35 www.edureka.in/apache-storm
Annie’s Question
Storm can be used in:
- Real-time Processing
- Batch Processing
- Both
Slide 36 www.edureka.in/apache-storm
Annie’s Answer
Real-time Processing
Slide 37 www.edureka.in/apache-storm
Annie’s Question
Which of them can be a source of Stream?
- Spout
- Bolt
- Both
Slide 38 www.edureka.in/apache-storm
Annie’s Answer
Both
Slide 39 www.edureka.in/apache-storm
Annie’s Question
It is not possible to run Storm process along with MapReduce jobs inside a
Hadoop Cluster.
- True
- False
Slide 40 www.edureka.in/apache-storm
Annie’s Answer
False. With Hadoop 2.0, it is possible.
Slide 41 www.edureka.in/apache-storm
ZooKeeper
Nimbus ZooKeeper
ZooKeeper
Supervisor
Supervisor
Supervisor
Supervisor
Supervisor
Nimbus node (master node, similar to the Hadoop
JobTracker):
» Uploads computations for execution
» Distributes code across the cluster
» Launches workers across the cluster
» Monitors computation and reallocates
workers as needed
ZooKeeper nodes:
» Coordinates the Storm cluster
Supervisor nodes :
» Communicates with Nimbus through
Zookeeper, starts and stops workers
according to signals from Nimbus
Storm Components
A Storm cluster has 3 sets of nodes
1. Nimbus node
2. Zookeeper nodes
3. Supervisor nodes
Slide 42 www.edureka.in/apache-storm
Annie’s Question
A Nimbus Node is similar to TaskTracker Node in Hadoop Cluster.
- True
- False
Slide 43 www.edureka.in/apache-storm
Annie’s Answer
No. A Nimbus Node is more like a JobTracker Node in Hadoop
Slide 44 www.edureka.in/apache-storm
Five key abstractions help to understand how Storm
processes data:
Tuples – an ordered list of elements. For example, a
“4-tuple” might be (7, 1, 3, 7)
Streams – an unbounded sequence of tuples
Spouts – sources of streams in a computation (e.g. a
Twitter API)
Bolts – process input streams and produce output
streams. They can: run functions; filter, aggregate, or
join data; or talk to databases
Topologies – the overall calculation, represented
visually as a network of spouts and bolts
spout
spout
bolt
bolt
bolt
bolt
Storm users define topologies for how to process the data when it comes streaming in from the spout.
Slide 45 www.edureka.in/apache-storm
Storm Components
Annie’s Question
A Storm topology is defined in terms of
- Nimbus, Zookeeper, Supervisor nodes
- Spout, Bolt
- Spout, Bolt, Nimbus, Zookeeper, Supervisor nodes
- Spout, Bolt, Zookeeper node
Slide 46 www.edureka.in/apache-storm
Annie’s Answer
Spout and Bolt
Slide 47 www.edureka.in/apache-storm
Use Cases of Storm
Processing Streams
Distributed Remote
Procedure Call
Unlike other stream
processing systems,
with Storm there’s no
need for intermediate
queues.
Send data to clients
continuously so they
can update and show
results in real time,
such as site metrics.
Easily parallelize CPU-
intensive operations.
Continuous
Computation
Use Cases of Storm
Slide 48 www.edureka.in/apache-storm
Use Cases of Storm
Slide 49 www.edureka.in/apache-storm
Financial Services
» Securities Fraud
» Compliance Violations
» Order Routing
» Pricing
 Telecom
» Security Breaches
» Network Outages
» Bandwidth Allocation
» Customer Service
 Retail
» Shrinkage
» Stock outs
» Offers
» Pricing
Web
» Application Failure
» Operational Issues
» Personalized Content
Use Storm to prevent certain outcomes or to optimize their objectives.
Key Differentiators
Simple to Program Fault-tolerant
It’s painful to do real-
time processing from
scratch.
With storm,
complexity is reduced
drastically.
It’s easier to develop
in a JVM-based
language, but Storm
supports any
language
as long as you use or
implement a small
intermediary library.
The Storm cluster
takes care of workers
going down,
reassigning tasks
when
necessary.
Support for Multiple
Programming
Languages
Key Differentiators
Slide 50 www.edureka.in/apache-storm
Assignment
Slide 51 www.edureka.in/apache-storm
Try setting up single-node Storm cluster on your system as shown in LMS Apache Storm single-node
cluster installation document.
Pre-work
Slide 52 www.edureka.in/apache-storm
Install Ubuntu Vmware Player on your System.
Install single-node Storm cluster on your system.
What’s within the LMS
This section will
give you an
insight of
Apache Storm
course
Slide 53 www.edureka.in/apache-storm
What’s within the LMS
Click here to
expand and view
all the elements
of this Module
Slide 54 www.edureka.in/apache-storm
What’s within the LMS
Assignment
Pre-work
Slide 55 www.edureka.in/apache-storm
Quiz
edureka !
/•

More Related Content

What's hot

Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
Rodney Joyce
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
Databricks
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Edureka!
 
dbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchezdbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchez
GoDataDriven
 
3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta
Databricks
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkDatabricks
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Databricks
 
Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lake
Mykola Zerniuk
 
File Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and ParquetFile Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and Parquet
DataWorks Summit/Hadoop Summit
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
 
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Edureka!
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
Databricks
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
Databricks
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
Databricks
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
Julien Le Dem
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)
Databricks
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
Databricks
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump In
SnapLogic
 
Modern Data Warehouse with Azure Synapse.pdf
Modern Data Warehouse with Azure Synapse.pdfModern Data Warehouse with Azure Synapse.pdf
Modern Data Warehouse with Azure Synapse.pdf
Keyla Dolores Méndez
 

What's hot (20)

Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
 
dbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchezdbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchez
 
3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lake
 
File Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and ParquetFile Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and Parquet
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump In
 
Modern Data Warehouse with Azure Synapse.pdf
Modern Data Warehouse with Azure Synapse.pdfModern Data Warehouse with Azure Synapse.pdf
Modern Data Warehouse with Azure Synapse.pdf
 

Viewers also liked

Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
datamantra
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
DataWorks Summit/Hadoop Summit
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Slim Baltagi
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
Slim Baltagi
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
Slim Baltagi
 

Viewers also liked (6)

Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 

Similar to Apache Storm

Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
Edureka!
 
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Nati Shalom
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Alluxio, Inc.
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
Edureka!
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Course
jimliddle
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
HostedbyConfluent
 
Data Engineer's Lunch #63: Building a Cryptocurrency Data Catalogue
Data Engineer's Lunch #63: Building a Cryptocurrency Data CatalogueData Engineer's Lunch #63: Building a Cryptocurrency Data Catalogue
Data Engineer's Lunch #63: Building a Cryptocurrency Data Catalogue
Anant Corporation
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
Hadoop User Group
 
Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...
Big Data Spain
 
How it works- Data Science
How it works- Data ScienceHow it works- Data Science
How it works- Data Science
Edureka!
 
Learn Big Data & Hadoop
Learn Big Data & Hadoop Learn Big Data & Hadoop
Learn Big Data & Hadoop
Edureka!
 
NoSQL
NoSQLNoSQL
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
Josh Patterson
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Dayprogrammermag
 
Webinar: Ways to Succeed with Hadoop in 2015
Webinar: Ways to Succeed with Hadoop in 2015Webinar: Ways to Succeed with Hadoop in 2015
Webinar: Ways to Succeed with Hadoop in 2015
Edureka!
 
Introduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -IIntroduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -I
Edureka!
 
Hadoop for Data Warehousing professionals
Hadoop for Data Warehousing professionalsHadoop for Data Warehousing professionals
Hadoop for Data Warehousing professionals
Edureka!
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
 

Similar to Apache Storm (20)

Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Course
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
 
Data Engineer's Lunch #63: Building a Cryptocurrency Data Catalogue
Data Engineer's Lunch #63: Building a Cryptocurrency Data CatalogueData Engineer's Lunch #63: Building a Cryptocurrency Data Catalogue
Data Engineer's Lunch #63: Building a Cryptocurrency Data Catalogue
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...
 
How it works- Data Science
How it works- Data ScienceHow it works- Data Science
How it works- Data Science
 
Learn Big Data & Hadoop
Learn Big Data & Hadoop Learn Big Data & Hadoop
Learn Big Data & Hadoop
 
NoSQL
NoSQLNoSQL
NoSQL
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
 
Webinar: Ways to Succeed with Hadoop in 2015
Webinar: Ways to Succeed with Hadoop in 2015Webinar: Ways to Succeed with Hadoop in 2015
Webinar: Ways to Succeed with Hadoop in 2015
 
Introduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -IIntroduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -I
 
Hadoop for Data Warehousing professionals
Hadoop for Data Warehousing professionalsHadoop for Data Warehousing professionals
Hadoop for Data Warehousing professionals
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 

More from Edureka!

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
Edureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
Edureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
Edureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
Edureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
Edureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
Edureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
Edureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
Edureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
Edureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
Edureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
Edureka!
 

More from Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Recently uploaded

Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
Col Mukteshwar Prasad
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
Vivekanand Anglo Vedic Academy
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Steve Thomason
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
EduSkills OECD
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 

Recently uploaded (20)

Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 

Apache Storm

  • 1. Module-1 Introduction to Big Data and STORM www.edureka.in/apache-storm
  • 2. LIVE On-line Class Class Recording in LMS 24/7 Post Class Support Module Wise Quiz and Assignment Project Work on Large Data Set Verifiable Certificate How it Works? Slide 2 www.edureka.in/apache-storm
  • 3. Course Topics Slide 3 www.edureka.in/apache-storm  Module 1 » Introduction to Big Data and Storm  Module 2 » Storm Technology Stack and Groupings  Module 3 » Spouts and Bolts  Module 4 » Trident Topologies  Module 5 » Real Life Storm Project -1  Module 6 » Real Life Storm Project -2
  • 4. Objectives Slide 4 www.edureka.in/apache-storm At the end of this module, you will be able to: Recall Big Data and Hadoop Understand Batch and Real-time Analytics of Big Data Investigate Shortcoming of Hadoop Understand Lambda Architecture Develop a basic knowledge of Apache Storm and its components Explain the Use Cases and Key Differentiators of Storm
  • 5. Big Data Slide 5 www.edureka.in/apache-storm Storm is a open source computing system used for processing Real-time Big Data Analytics. Lets understand Big Data first to learn STORM.
  • 6.  Lots of Data - Terabytes or Petabytes  Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.  The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization. What is Big Data? Slide 6 www.edureka.in/apache-storm
  • 7.  Systems / Enterprises generate huge amount of data from Terabytes and even Petabytes of information. Stock market generates about one terabyte of new trade data per day to perform stock trading analytics to determine trends for optimal trades. What is Big Data? Slide 7 www.edureka.in/apache-storm
  • 8.  2,500 exabytes of new information in 2012 with Internet as primary driver.  Digital universe grew by 62% last year to 800K petabytes and will grow to 1.2 “zettabytes” this year. Slide 8 www.edureka.in/apache-storm Un-structured Data is Exploding
  • 9. IBM’s Definition – Big Data Characteristics http://www-01.ibm.com/software/data/bigdata/ IBM’s Definition Web logs Images Videos Sensor Data Audios VOLUME VELOCITY VARIETY Slide 9 www.edureka.in/apache-storm
  • 10. Annie’s Introduction Hello There!! My name is Annie. I love quizzes and puzzles and I am here to make you guys think and answer my questions. Slide 10 www.edureka.in/apache-storm
  • 11. Annie’s Question Map the following to correspolnodinTghdeatraet!y!pe: Slide 11 www.edureka.in/apache-storm My name is Annie. I lo quizzes and Data from EpnutezrpzrliseessyastnemdsI(EaRmP, CRhMereetc.)to make you guys think and answer my questions. - XML files - Word docs, PDF files, Text files - - E-Mail body
  • 12. Annie’s Answer XML files -> Semi-structureldodTathaere!! Slide 12 www.edureka.in/apache-storm Word docs, PDF filesM, Tyextnfailems -e> UisnsAtrnunctiuer.ed DataE-Mail body -> Unstructured Data Data from EnterpriseIsylostems q(EuRiPz, zCReMs eatcn.)d-> Structured Data puzzles and I am here to make you guys think and answer my questions.
  • 13. Hadoop and its primary programming model, Map-Reduce, are great for batch-oriented processing of huge amount of data. With growing data, Hadoop enables you to horizontally scale your cluster by adding commodity nodes and thus keep up with query workloads. is primary programming model great for batch-oriented processing of huge amount of data Big Data Batch Analytics Slide 13 www.edureka.in/apache-storm
  • 14. What is Hadoop? Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of commodity computers using a simple programming model. It is an Open-source Data Management with scale-out storage and distributed processing. Slide 14 www.edureka.in/apache-storm
  • 15. Hadoop Eco-System Apache Oozie (Workflow) HDFS (Hadoop Distributed File System) HIVE DW System Pig Latin Data Analysis Other YARN Frameworks (MPI,GIRAPH)MapReduce Framework HBase YARN Cluster Resource Management Slide 15 www.edureka.in/apache-storm
  • 16. This evolution has forced the addition of support for Higher Level Languages (Pig & Hive) New Real-time Storage Engines (HBase) Big Data Batch Analytics Extensions for Streaming Data (Hadoop Streaming) Slide 16 www.edureka.in/apache-storm
  • 17. Due to batch processing, Hadoop should be deployed in situations such as Index Building Pattern Recognitions Creating Recommendation Engine Sentiment Analysis Situations generate huge amount of data stored queried Hadoop for Batch Analytics Slide 17 www.edureka.in/apache-storm
  • 18. Real-time Big Data Analytics Social Networking: » Pick your own Big Data database (RDBMS or NoSQL) » Measure the immediate impact to your site traffic from social media, whether a new blog post, a tweet, a “Like”, or even a comment. » Knowing this information translates to better conversion and more effective online campaigns. Slide 18 www.edureka.in/apache-storm
  • 19. Real-time Big Data Analytics SaaS: » Measuring user behaviour and acting upon it is crucial for improving customer satisfaction and conversion rates – which represent immediate increases in revenue. Slide 19 www.edureka.in/apache-storm
  • 20. Real-time Big Data Analytics Financial Services: » Determining in real time whether your portfolio is losing money, or if there is fraud in your system means that you can prevent disasters as they occur, not after the damage is done. » Correlating multiple sources from the market in real-time results in a more accurate view of the market and enables more accurate actions to maximize your profit. Slide 20 www.edureka.in/apache-storm
  • 21. Real Time Big Data Analytics - Options Apache StormAmazon Kinesis Slide 21 www.edureka.in/apache-storm
  • 22. Problem Statement: To find the total number of page views of Edureka’s blog over a range of time. Google Analytics can provide you this information. Example: For a particular day, the data can be: Need for Real-time Analytics Slide 22 www.edureka.in/apache-storm
  • 23. petabyte – scale All Data Slide 23 www.edureka.in/apache-storm Need for Real-time Analytics Challenge: Querying huge amount of Historical Data is slow
  • 24. Precomputed View All Data Query Slide 24 www.edureka.in/apache-storm Need for Real-time Analytics Solution: Precompute historical data
  • 25. Need for Real-time Analytics Google Analytics might have to keep the historical data for each hour as precompiled view Page view Page view Page view Page view Page view All Data Query Slide 25 www.edureka.in/apache-storm URL Hr of the day No. of pageviews edureka.in/blog/aboutapachestorm 1 250 edureka.in/blog/aboutapachestorm 2 300 edureka.in/blog/aboutapachestorm 3 455 edureka.in/blog/aboutapachestorm 4 460 edureka.in/blog/aboutapachestorm 5 320 edureka.in/blog/aboutapachestorm 6 111 edureka.in/blog/aboutapachestorm 7 129 Precomputed View
  • 26. Need for Real-time Analytics Precomputed View All Data Query Slide 26 www.edureka.in/apache-storm using Hadoop
  • 27. But, what about the data generated after last precompiled view? Slide 27 www.edureka.in/apache-storm Need for Real-time Analytics
  • 28. Compensating for last few hours of data Need for Real-time Analytics spout bolt bolt bolt Real-time View Storm Real-time Data Stored Or Slide 28 www.edureka.in/apache-storm Or
  • 29. All Data Precomputed Batch View Precomputed Real-time View Query New Data Stream Storm Slide 29 www.edureka.in/apache-storm Hadoop Need for Real-time Analytics
  • 30. Lambda Architecture All data entering the system is dispatched to both the batch layer and the speed layer for processing. New Data Speed Layer Slide 30 www.edureka.in/apache-storm Batch Layer 1 Serving Layer
  • 31. Lambda Architecture Batch View Batch View Master Dataset The batch layer has two functions: » managing the master dataset (an immutable, append-only set of raw data), and » to pre-compute the batch views. The serving layer indexes the batch views so that they can be queried in low-latency, ad-hoc way. Batch Layer Serving Layer New Data Speed Layer 1 2 3 Slide 31 www.edureka.in/apache-storm
  • 32. Lambda Architecture The speed layer compensates for the high latency of updates to the serving layer and deals with recent data only. Batch View Batch View Real-time View Master Dataset New Data Speed Layer Real-time View Batch Layer Serving Layer 1 2 3 Slide 32 www.edureka.in/apache-storm 4
  • 33. Lambda Architecture Any incoming query can be answered by merging results from batch views and real-time views. Batch View Batch View Real-time View Master Dataset New Data Query Speed Layer Query Real-time View Batch Layer Serving Layer 1 2 3 Slide 33 www.edureka.in/apache-storm 4 5
  • 34. Storm is a distributed, reliable, fault-tolerant system for processing streams of data. Fault-tolerant STORM processing Streams of Data What is Storm? Slide 34 www.edureka.in/apache-storm
  • 35. The work is delegated to different types of components that are each responsible for a simple specific processing task. The input stream of a Storm cluster is handled by a component called a spout. The spout passes the data to a component called a bolt, which transforms it in some way. A bolt either persists the data in some sort of storage, or passes it to some other bolt. transforms data bolt bolt spout spout bolt bolt passes data passes data data storage Input Data Source What is Storm? Slide 35 www.edureka.in/apache-storm
  • 36. Annie’s Question Storm can be used in: - Real-time Processing - Batch Processing - Both Slide 36 www.edureka.in/apache-storm
  • 37. Annie’s Answer Real-time Processing Slide 37 www.edureka.in/apache-storm
  • 38. Annie’s Question Which of them can be a source of Stream? - Spout - Bolt - Both Slide 38 www.edureka.in/apache-storm
  • 39. Annie’s Answer Both Slide 39 www.edureka.in/apache-storm
  • 40. Annie’s Question It is not possible to run Storm process along with MapReduce jobs inside a Hadoop Cluster. - True - False Slide 40 www.edureka.in/apache-storm
  • 41. Annie’s Answer False. With Hadoop 2.0, it is possible. Slide 41 www.edureka.in/apache-storm
  • 42. ZooKeeper Nimbus ZooKeeper ZooKeeper Supervisor Supervisor Supervisor Supervisor Supervisor Nimbus node (master node, similar to the Hadoop JobTracker): » Uploads computations for execution » Distributes code across the cluster » Launches workers across the cluster » Monitors computation and reallocates workers as needed ZooKeeper nodes: » Coordinates the Storm cluster Supervisor nodes : » Communicates with Nimbus through Zookeeper, starts and stops workers according to signals from Nimbus Storm Components A Storm cluster has 3 sets of nodes 1. Nimbus node 2. Zookeeper nodes 3. Supervisor nodes Slide 42 www.edureka.in/apache-storm
  • 43. Annie’s Question A Nimbus Node is similar to TaskTracker Node in Hadoop Cluster. - True - False Slide 43 www.edureka.in/apache-storm
  • 44. Annie’s Answer No. A Nimbus Node is more like a JobTracker Node in Hadoop Slide 44 www.edureka.in/apache-storm
  • 45. Five key abstractions help to understand how Storm processes data: Tuples – an ordered list of elements. For example, a “4-tuple” might be (7, 1, 3, 7) Streams – an unbounded sequence of tuples Spouts – sources of streams in a computation (e.g. a Twitter API) Bolts – process input streams and produce output streams. They can: run functions; filter, aggregate, or join data; or talk to databases Topologies – the overall calculation, represented visually as a network of spouts and bolts spout spout bolt bolt bolt bolt Storm users define topologies for how to process the data when it comes streaming in from the spout. Slide 45 www.edureka.in/apache-storm Storm Components
  • 46. Annie’s Question A Storm topology is defined in terms of - Nimbus, Zookeeper, Supervisor nodes - Spout, Bolt - Spout, Bolt, Nimbus, Zookeeper, Supervisor nodes - Spout, Bolt, Zookeeper node Slide 46 www.edureka.in/apache-storm
  • 47. Annie’s Answer Spout and Bolt Slide 47 www.edureka.in/apache-storm
  • 48. Use Cases of Storm Processing Streams Distributed Remote Procedure Call Unlike other stream processing systems, with Storm there’s no need for intermediate queues. Send data to clients continuously so they can update and show results in real time, such as site metrics. Easily parallelize CPU- intensive operations. Continuous Computation Use Cases of Storm Slide 48 www.edureka.in/apache-storm
  • 49. Use Cases of Storm Slide 49 www.edureka.in/apache-storm Financial Services » Securities Fraud » Compliance Violations » Order Routing » Pricing  Telecom » Security Breaches » Network Outages » Bandwidth Allocation » Customer Service  Retail » Shrinkage » Stock outs » Offers » Pricing Web » Application Failure » Operational Issues » Personalized Content Use Storm to prevent certain outcomes or to optimize their objectives.
  • 50. Key Differentiators Simple to Program Fault-tolerant It’s painful to do real- time processing from scratch. With storm, complexity is reduced drastically. It’s easier to develop in a JVM-based language, but Storm supports any language as long as you use or implement a small intermediary library. The Storm cluster takes care of workers going down, reassigning tasks when necessary. Support for Multiple Programming Languages Key Differentiators Slide 50 www.edureka.in/apache-storm
  • 51. Assignment Slide 51 www.edureka.in/apache-storm Try setting up single-node Storm cluster on your system as shown in LMS Apache Storm single-node cluster installation document.
  • 52. Pre-work Slide 52 www.edureka.in/apache-storm Install Ubuntu Vmware Player on your System. Install single-node Storm cluster on your system.
  • 53. What’s within the LMS This section will give you an insight of Apache Storm course Slide 53 www.edureka.in/apache-storm
  • 54. What’s within the LMS Click here to expand and view all the elements of this Module Slide 54 www.edureka.in/apache-storm
  • 55. What’s within the LMS Assignment Pre-work Slide 55 www.edureka.in/apache-storm Quiz