SlideShare a Scribd company logo
1 of 37
Distributed and Fault-TolerantDistributed and Fault-Tolerant
Realtime ComputationRealtime Computation
www.folio3.com@folio_3
Folio3 – OverviewFolio3 – Overview
www.folio3.com @folio_3
Who We Are
 We are a Development Partner for our customers
 Design software solutions, not just implement them
 Focus on the solution – Platform and technology agnostic
 Expertise in building applications that are:
Mobile Social Cloud-based Gamified
What We Do
 Areas of Focus
 Enterprise
 Custom enterprise applications
 Product development targeting the enterprise
 Mobile
 Custom mobile apps for iOS, Android, Windows Phone, BB OS
 Mobile platform (server-to-server) development
 Social Media
 CMS based websites for consumers and enterprise (corporate, consumer,
community & social networking)
 Social media platform development (enterprise & consumer)
Folio3 At a Glance
 Founded in 2005
 Over 200 full time employees
 Offices in the US, Canada, Bulgaria & Pakistan
 Palo Alto, CA.
 Sofia, Bulgaria
 Karachi, Pakistan
Toronto, Canada
Areas of Focus: Enterprise
 Automating workflows
 Cloud based solutions
 Application integration
 Platform development
 Healthcare
 Mobile Enterprise
 Digital Media
 Supply Chain
Some of Our Enterprise Clients
Areas of Focus: Mobile
 Serious enterprise applications for Banks,
Businesses
 Fun consumer apps for app discovery,
interaction, exercise gamification and play
 Educational apps
 Augmented Reality apps
 Mobile Platforms
Some of Our Mobile Clients
Areas of Focus: Web & Social Media
 Community Sites based on
Content Management Systems
 Enterprise Social Networking
 Social Games for Facebook &
Mobile
 Companion Apps for games
Some of Our Web Clients
www.folio3.com @folio_3
Distributed and Fault-TolerantDistributed and Fault-Tolerant
Realtime ComputationRealtime Computation
Agenda
 Big Data
 Hadoop Vs Storm
 Lambda Architecture
 Storm Architecture And Concepts
Big Data
To understand “Big Data”, it has four dimensions :
 Volume : Scale of Data (terabytes, petabytes, exabytes)
 Velocity : Need to be analyzed quickly (milliseconds to
seconds to respond)
 Variety : Different forms of Data (& Data Sources)
 Veracity : Uncertainty of Data (due to data inconsistency,
ambiguities, latency, data incompleteness)
Example Query
Total Number of Page Views To A Website
URL over a range of time
Example Query
function pageViewsOverTime(bigData, url, startTime, endTime) {
int count = 0;
for (data : bigData) {
if ( data.url == url &&
data.timestamp >= startTime &&
data.timestamp <= endTime ) {
count ++;
}
}
return count;
}
Example Query
TOO SLOW : Big Data is in petabytes
(Volume)
Hadoop Data Processing Architecture
Data
Store
(HDFS)
Hadoop
(Map
Reduce)
Batch View
(Processed
Data)
Query
 Views generated in batch maybe out of date
 Batch workflow is too slow
Data Flow Batch Run
Lambda Architecture
Immutable Master Dataset ( stored in HDFS)
What is Apache Storm ?
 Storm is a real-time distributed computing framework for
reliably processing large volumes of high velocity unbounded
data streams.
 It was created by Nathan Marz and his team at BackType, and
released as open source in 2011(after BackType was acquired by
Twitter)
Five characteristics make Storm ideal for real-time data processing
workloads.
 Fast – benchmarked at processing one million+ 100 byte messages per second
per node
 Scalable – with parallel calculations that run across a cluster of machines
 Fault-tolerant – when workers die, Storm will automatically restart them. If a
node dies, the work will be restarted on another node.
 Reliable – Storm guarantees that each unit of data (tuple) will be processed at
least once or exactly once. Messages are only replayed when there are failures.
 Easy to operate – standard configurations are suitable for production on day
one. Once deployed, Storm is easy to operate.
Tweet from Nathan Marz (31 May 2012)
Storm Topology
 The input stream of a Storm cluster is handled by a component called a Spout.
 The spout passes the to a component called a Bolt, which transforms it in some
way.
 A Bolt either persists the data in storage, or passes it to some other bolt.
Functional Programming
h(g(f(data)))
λ-calculus
Sample Problem
… Thus the heavens and the earth were finished, and all the host of them.
And on the seventh day God ended his work which he had made
and he rested on the seventh day from all his work which he had made…
File : Bible.txt
(“thus”, “the”, “heavens”, “and”, “the”, “earth”, “were”,
“finished”
“and”, “all”, “the”, “host”, “of”, “them”)
{“Thus the heavens and the earth were finished, and all the host of
them.”}
{“And on the seventh day God ended his work which he had made”}
( (“testaments”, 10), (“holy”, 12), (“faith”,
34) )
f
g
h
Relationship of Storm Topology with Functional
Programming
BoltBolt BoltBoltSpoutSpoutData
f g h
Line-reader Word-Splitter Word-Counter
Data Source Reliability
 A data source is considered “unreliable”, if there is no means to replay a
message.
 A data source is considered “reliable” if it can somehow replay a
message if processing fails at any point.
 A data source is considered “durable” if it can replay any message or set
of messages given the necessary selection criteria.
Reliability Limitations: Integrating Kafka with Apache Storm
 Exactly once processing requires a “durable” data source.
 At least once processing requires a “reliable” data source.
 An “unreliable” data source can be wrapped to provide additional
guarantees.
 For Apache Storm (demo), I’ve backed up unreliable data source with
Apache Kafka (minor latency overhead to ensure 100% durability).
Relationship of Storm Topology with Functional Programming
BoltBolt BoltBoltSpoutSpout
Data
f g h
Storm Spout subscribed to topic
bible of kafka messaging queue
Word-Splitter Word-CounterTopic: bible
…5|4|3|2|1
Line-reader
Scenarios / Use cases where Storm can be effectively used
 Predictive Analysis
 Social Graph Analysis
 Network Monitoring
 Recommendation Engine
 Realtime Analytics
 Online Machine Learning
 Continuous Computation
 Distributed Remote Procedure Call
 Website Activity Tracking
 Log Aggregation
Storm Components
A Storm cluster has 3 sets of nodes
Nimbus Nodes
Zookeeper Nodes
Supervisor Nodes
Storm Components
A Storm cluster has 3 sets of nodes
Nimbus Nodes
Zookeeper Nodes
Supervisor Nodes
 Master Node Daemon
 Distributes code across the
cluster
 Launches workers across the
cluster
 Monitors computation and
reallocates workers as needed
Storm Components
A Storm cluster has 3 sets of nodes
Nimbus Nodes
Zookeeper Nodes
Supervisor Nodes
 Manages all the coordination
between Nimbus and the
supervisors.
Storm Components
A Storm cluster has 3 sets of nodes
Nimbus Nodes
Zookeeper Nodes
Supervisor Nodes
 Executes a subset of topology
(spout and /or bolts).
 Listens for jobs assigned to the
machine and starts and stops
worker processes as necessary.
Known Limitations:
 Nimbus : A single point of failure
 When Nimbus is down :
 Topologies continue to work
 Tasks from failing nodes (Spouts/Bolts) aren’t replayed
 Can’t upload a new topology or rebalance an old one
 It is recommended to run Nimbus under daemon tool or monit so that
it could be restarted automatically when it is down.
(In contrast to Hadoop, if the Job Tracker dies, all the running jobs are lost)
Contact
 For more details about our services, please get in touch
with us.
contact@folio3.com
US Office: (408) 365-4638
www.folio3.com

More Related Content

What's hot

Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Brian O'Neill
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsData Con LA
 
Apache Storm Concepts
Apache Storm ConceptsApache Storm Concepts
Apache Storm ConceptsAndré Dias
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormNati Shalom
 
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNApache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNblueboxtraveler
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.DECK36
 
Storm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataStorm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataDataWorks Summit
 
Functional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming FrameworksFunctional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming FrameworksHuafeng Wang
 
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016Adrianos Dadis
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingHari Shreedharan
 
Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016Adrianos Dadis
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaAndrew Montalenti
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormEugene Dvorkin
 
Spark vs storm
Spark vs stormSpark vs storm
Spark vs stormTrong Ton
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and howPetr Zapletal
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Michael Noll
 

What's hot (20)

Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
 
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
 
Apache Storm Concepts
Apache Storm ConceptsApache Storm Concepts
Apache Storm Concepts
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using Storm
 
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNApache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.
 
Storm and Cassandra
Storm and Cassandra Storm and Cassandra
Storm and Cassandra
 
Storm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataStorm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-Data
 
Functional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming FrameworksFunctional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming Frameworks
 
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and Kafka
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
 
Introduction to Storm
Introduction to StormIntroduction to Storm
Introduction to Storm
 
Spark vs storm
Spark vs stormSpark vs storm
Spark vs storm
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and how
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
 

Similar to Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache Kafka and Apache Zookeeper

Windows Azure: Lessons From The Field
Windows Azure: Lessons From The FieldWindows Azure: Lessons From The Field
Windows Azure: Lessons From The FieldRob Gillen
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowKaxil Naik
 
Google Cluster Innards
Google Cluster InnardsGoogle Cluster Innards
Google Cluster InnardsMartin Dvorak
 
So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer OverlordsIan Foster
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Chris Baglieri
 
Introduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big DataIntroduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big Datawaheed751
 
Splunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the messageSplunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the messageDamien Dallimore
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...BigDataEverywhere
 
Fundamental question and answer in cloud computing quiz by animesh chaturvedi
Fundamental question and answer in cloud computing quiz by animesh chaturvediFundamental question and answer in cloud computing quiz by animesh chaturvedi
Fundamental question and answer in cloud computing quiz by animesh chaturvediAnimesh Chaturvedi
 
RTI Data-Distribution Service (DDS) Master Class 2011
RTI Data-Distribution Service (DDS) Master Class 2011RTI Data-Distribution Service (DDS) Master Class 2011
RTI Data-Distribution Service (DDS) Master Class 2011Gerardo Pardo-Castellote
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDBDenny Lee
 
Binary Studio Academy 2016. MS Azure. Cloud hosting.
Binary Studio Academy 2016. MS Azure. Cloud hosting.Binary Studio Academy 2016. MS Azure. Cloud hosting.
Binary Studio Academy 2016. MS Azure. Cloud hosting.Binary Studio
 

Similar to Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache Kafka and Apache Zookeeper (20)

Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Microsoft Dryad
Microsoft DryadMicrosoft Dryad
Microsoft Dryad
 
Windows Azure: Lessons From The Field
Windows Azure: Lessons From The FieldWindows Azure: Lessons From The Field
Windows Azure: Lessons From The Field
 
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
 
Google Cluster Innards
Google Cluster InnardsGoogle Cluster Innards
Google Cluster Innards
 
So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer Overlords
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
 
Introduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big DataIntroduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big Data
 
Splunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the messageSplunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the message
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
 
Fundamental question and answer in cloud computing quiz by animesh chaturvedi
Fundamental question and answer in cloud computing quiz by animesh chaturvediFundamental question and answer in cloud computing quiz by animesh chaturvedi
Fundamental question and answer in cloud computing quiz by animesh chaturvedi
 
RTI Data-Distribution Service (DDS) Master Class 2011
RTI Data-Distribution Service (DDS) Master Class 2011RTI Data-Distribution Service (DDS) Master Class 2011
RTI Data-Distribution Service (DDS) Master Class 2011
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
Binary Studio Academy 2016. MS Azure. Cloud hosting.
Binary Studio Academy 2016. MS Azure. Cloud hosting.Binary Studio Academy 2016. MS Azure. Cloud hosting.
Binary Studio Academy 2016. MS Azure. Cloud hosting.
 

More from Folio3 Software

Shopify & Shopify Plus Ecommerce Development Experts
Shopify & Shopify Plus Ecommerce Development Experts Shopify & Shopify Plus Ecommerce Development Experts
Shopify & Shopify Plus Ecommerce Development Experts Folio3 Software
 
Magento and Magento 2 Ecommerce Development
Magento and Magento 2 Ecommerce Development Magento and Magento 2 Ecommerce Development
Magento and Magento 2 Ecommerce Development Folio3 Software
 
All You Need to Know About Type Script
All You Need to Know About Type ScriptAll You Need to Know About Type Script
All You Need to Know About Type ScriptFolio3 Software
 
A Guideline to Test Your Own Code - Developer Testing
A Guideline to Test Your Own Code - Developer TestingA Guideline to Test Your Own Code - Developer Testing
A Guideline to Test Your Own Code - Developer TestingFolio3 Software
 
OWIN (Open Web Interface for .NET)
OWIN (Open Web Interface for .NET)OWIN (Open Web Interface for .NET)
OWIN (Open Web Interface for .NET)Folio3 Software
 
An Introduction to CSS Preprocessors (SASS & LESS)
An Introduction to CSS Preprocessors (SASS & LESS)An Introduction to CSS Preprocessors (SASS & LESS)
An Introduction to CSS Preprocessors (SASS & LESS)Folio3 Software
 
Introduction to SharePoint 2013
Introduction to SharePoint 2013Introduction to SharePoint 2013
Introduction to SharePoint 2013Folio3 Software
 
An Overview of Blackberry 10
An Overview of Blackberry 10An Overview of Blackberry 10
An Overview of Blackberry 10Folio3 Software
 
StackOverflow Architectural Overview
StackOverflow Architectural OverviewStackOverflow Architectural Overview
StackOverflow Architectural OverviewFolio3 Software
 
Enterprise Mobility - An Introduction
Enterprise Mobility - An IntroductionEnterprise Mobility - An Introduction
Enterprise Mobility - An IntroductionFolio3 Software
 
Introduction to Enterprise Service Bus
Introduction to Enterprise Service BusIntroduction to Enterprise Service Bus
Introduction to Enterprise Service BusFolio3 Software
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraFolio3 Software
 
Regular Expression in Action
Regular Expression in ActionRegular Expression in Action
Regular Expression in ActionFolio3 Software
 
HTTP Server Push Techniques
HTTP Server Push TechniquesHTTP Server Push Techniques
HTTP Server Push TechniquesFolio3 Software
 
Best Practices of Software Development
Best Practices of Software DevelopmentBest Practices of Software Development
Best Practices of Software DevelopmentFolio3 Software
 
Offline Data Access in Enterprise Mobility
Offline Data Access in Enterprise MobilityOffline Data Access in Enterprise Mobility
Offline Data Access in Enterprise MobilityFolio3 Software
 
Realtime and Synchronous Applications
Realtime and Synchronous ApplicationsRealtime and Synchronous Applications
Realtime and Synchronous ApplicationsFolio3 Software
 

More from Folio3 Software (20)

Shopify & Shopify Plus Ecommerce Development Experts
Shopify & Shopify Plus Ecommerce Development Experts Shopify & Shopify Plus Ecommerce Development Experts
Shopify & Shopify Plus Ecommerce Development Experts
 
Magento and Magento 2 Ecommerce Development
Magento and Magento 2 Ecommerce Development Magento and Magento 2 Ecommerce Development
Magento and Magento 2 Ecommerce Development
 
All You Need to Know About Type Script
All You Need to Know About Type ScriptAll You Need to Know About Type Script
All You Need to Know About Type Script
 
Enter the Big Picture
Enter the Big PictureEnter the Big Picture
Enter the Big Picture
 
A Guideline to Test Your Own Code - Developer Testing
A Guideline to Test Your Own Code - Developer TestingA Guideline to Test Your Own Code - Developer Testing
A Guideline to Test Your Own Code - Developer Testing
 
OWIN (Open Web Interface for .NET)
OWIN (Open Web Interface for .NET)OWIN (Open Web Interface for .NET)
OWIN (Open Web Interface for .NET)
 
Introduction to Go-Lang
Introduction to Go-LangIntroduction to Go-Lang
Introduction to Go-Lang
 
An Introduction to CSS Preprocessors (SASS & LESS)
An Introduction to CSS Preprocessors (SASS & LESS)An Introduction to CSS Preprocessors (SASS & LESS)
An Introduction to CSS Preprocessors (SASS & LESS)
 
Introduction to SharePoint 2013
Introduction to SharePoint 2013Introduction to SharePoint 2013
Introduction to SharePoint 2013
 
An Overview of Blackberry 10
An Overview of Blackberry 10An Overview of Blackberry 10
An Overview of Blackberry 10
 
StackOverflow Architectural Overview
StackOverflow Architectural OverviewStackOverflow Architectural Overview
StackOverflow Architectural Overview
 
Enterprise Mobility - An Introduction
Enterprise Mobility - An IntroductionEnterprise Mobility - An Introduction
Enterprise Mobility - An Introduction
 
Introduction to Docker
Introduction to DockerIntroduction to Docker
Introduction to Docker
 
Introduction to Enterprise Service Bus
Introduction to Enterprise Service BusIntroduction to Enterprise Service Bus
Introduction to Enterprise Service Bus
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
 
Regular Expression in Action
Regular Expression in ActionRegular Expression in Action
Regular Expression in Action
 
HTTP Server Push Techniques
HTTP Server Push TechniquesHTTP Server Push Techniques
HTTP Server Push Techniques
 
Best Practices of Software Development
Best Practices of Software DevelopmentBest Practices of Software Development
Best Practices of Software Development
 
Offline Data Access in Enterprise Mobility
Offline Data Access in Enterprise MobilityOffline Data Access in Enterprise Mobility
Offline Data Access in Enterprise Mobility
 
Realtime and Synchronous Applications
Realtime and Synchronous ApplicationsRealtime and Synchronous Applications
Realtime and Synchronous Applications
 

Recently uploaded

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache Kafka and Apache Zookeeper

  • 1. Distributed and Fault-TolerantDistributed and Fault-Tolerant Realtime ComputationRealtime Computation www.folio3.com@folio_3
  • 2. Folio3 – OverviewFolio3 – Overview www.folio3.com @folio_3
  • 3. Who We Are  We are a Development Partner for our customers  Design software solutions, not just implement them  Focus on the solution – Platform and technology agnostic  Expertise in building applications that are: Mobile Social Cloud-based Gamified
  • 4. What We Do  Areas of Focus  Enterprise  Custom enterprise applications  Product development targeting the enterprise  Mobile  Custom mobile apps for iOS, Android, Windows Phone, BB OS  Mobile platform (server-to-server) development  Social Media  CMS based websites for consumers and enterprise (corporate, consumer, community & social networking)  Social media platform development (enterprise & consumer)
  • 5. Folio3 At a Glance  Founded in 2005  Over 200 full time employees  Offices in the US, Canada, Bulgaria & Pakistan  Palo Alto, CA.  Sofia, Bulgaria  Karachi, Pakistan Toronto, Canada
  • 6. Areas of Focus: Enterprise  Automating workflows  Cloud based solutions  Application integration  Platform development  Healthcare  Mobile Enterprise  Digital Media  Supply Chain
  • 7. Some of Our Enterprise Clients
  • 8. Areas of Focus: Mobile  Serious enterprise applications for Banks, Businesses  Fun consumer apps for app discovery, interaction, exercise gamification and play  Educational apps  Augmented Reality apps  Mobile Platforms
  • 9. Some of Our Mobile Clients
  • 10. Areas of Focus: Web & Social Media  Community Sites based on Content Management Systems  Enterprise Social Networking  Social Games for Facebook & Mobile  Companion Apps for games
  • 11. Some of Our Web Clients
  • 12. www.folio3.com @folio_3 Distributed and Fault-TolerantDistributed and Fault-Tolerant Realtime ComputationRealtime Computation
  • 13. Agenda  Big Data  Hadoop Vs Storm  Lambda Architecture  Storm Architecture And Concepts
  • 14. Big Data To understand “Big Data”, it has four dimensions :  Volume : Scale of Data (terabytes, petabytes, exabytes)  Velocity : Need to be analyzed quickly (milliseconds to seconds to respond)  Variety : Different forms of Data (& Data Sources)  Veracity : Uncertainty of Data (due to data inconsistency, ambiguities, latency, data incompleteness)
  • 15. Example Query Total Number of Page Views To A Website URL over a range of time
  • 16. Example Query function pageViewsOverTime(bigData, url, startTime, endTime) { int count = 0; for (data : bigData) { if ( data.url == url && data.timestamp >= startTime && data.timestamp <= endTime ) { count ++; } } return count; }
  • 17. Example Query TOO SLOW : Big Data is in petabytes (Volume)
  • 18. Hadoop Data Processing Architecture Data Store (HDFS) Hadoop (Map Reduce) Batch View (Processed Data) Query  Views generated in batch maybe out of date  Batch workflow is too slow Data Flow Batch Run
  • 20. Immutable Master Dataset ( stored in HDFS)
  • 21. What is Apache Storm ?  Storm is a real-time distributed computing framework for reliably processing large volumes of high velocity unbounded data streams.  It was created by Nathan Marz and his team at BackType, and released as open source in 2011(after BackType was acquired by Twitter)
  • 22. Five characteristics make Storm ideal for real-time data processing workloads.  Fast – benchmarked at processing one million+ 100 byte messages per second per node  Scalable – with parallel calculations that run across a cluster of machines  Fault-tolerant – when workers die, Storm will automatically restart them. If a node dies, the work will be restarted on another node.  Reliable – Storm guarantees that each unit of data (tuple) will be processed at least once or exactly once. Messages are only replayed when there are failures.  Easy to operate – standard configurations are suitable for production on day one. Once deployed, Storm is easy to operate.
  • 23. Tweet from Nathan Marz (31 May 2012)
  • 24. Storm Topology  The input stream of a Storm cluster is handled by a component called a Spout.  The spout passes the to a component called a Bolt, which transforms it in some way.  A Bolt either persists the data in storage, or passes it to some other bolt.
  • 26. Sample Problem … Thus the heavens and the earth were finished, and all the host of them. And on the seventh day God ended his work which he had made and he rested on the seventh day from all his work which he had made… File : Bible.txt (“thus”, “the”, “heavens”, “and”, “the”, “earth”, “were”, “finished” “and”, “all”, “the”, “host”, “of”, “them”) {“Thus the heavens and the earth were finished, and all the host of them.”} {“And on the seventh day God ended his work which he had made”} ( (“testaments”, 10), (“holy”, 12), (“faith”, 34) ) f g h
  • 27. Relationship of Storm Topology with Functional Programming BoltBolt BoltBoltSpoutSpoutData f g h Line-reader Word-Splitter Word-Counter
  • 28. Data Source Reliability  A data source is considered “unreliable”, if there is no means to replay a message.  A data source is considered “reliable” if it can somehow replay a message if processing fails at any point.  A data source is considered “durable” if it can replay any message or set of messages given the necessary selection criteria.
  • 29. Reliability Limitations: Integrating Kafka with Apache Storm  Exactly once processing requires a “durable” data source.  At least once processing requires a “reliable” data source.  An “unreliable” data source can be wrapped to provide additional guarantees.  For Apache Storm (demo), I’ve backed up unreliable data source with Apache Kafka (minor latency overhead to ensure 100% durability).
  • 30. Relationship of Storm Topology with Functional Programming BoltBolt BoltBoltSpoutSpout Data f g h Storm Spout subscribed to topic bible of kafka messaging queue Word-Splitter Word-CounterTopic: bible …5|4|3|2|1 Line-reader
  • 31. Scenarios / Use cases where Storm can be effectively used  Predictive Analysis  Social Graph Analysis  Network Monitoring  Recommendation Engine  Realtime Analytics  Online Machine Learning  Continuous Computation  Distributed Remote Procedure Call  Website Activity Tracking  Log Aggregation
  • 32. Storm Components A Storm cluster has 3 sets of nodes Nimbus Nodes Zookeeper Nodes Supervisor Nodes
  • 33. Storm Components A Storm cluster has 3 sets of nodes Nimbus Nodes Zookeeper Nodes Supervisor Nodes  Master Node Daemon  Distributes code across the cluster  Launches workers across the cluster  Monitors computation and reallocates workers as needed
  • 34. Storm Components A Storm cluster has 3 sets of nodes Nimbus Nodes Zookeeper Nodes Supervisor Nodes  Manages all the coordination between Nimbus and the supervisors.
  • 35. Storm Components A Storm cluster has 3 sets of nodes Nimbus Nodes Zookeeper Nodes Supervisor Nodes  Executes a subset of topology (spout and /or bolts).  Listens for jobs assigned to the machine and starts and stops worker processes as necessary.
  • 36. Known Limitations:  Nimbus : A single point of failure  When Nimbus is down :  Topologies continue to work  Tasks from failing nodes (Spouts/Bolts) aren’t replayed  Can’t upload a new topology or rebalance an old one  It is recommended to run Nimbus under daemon tool or monit so that it could be restarted automatically when it is down. (In contrast to Hadoop, if the Job Tracker dies, all the running jobs are lost)
  • 37. Contact  For more details about our services, please get in touch with us. contact@folio3.com US Office: (408) 365-4638 www.folio3.com