SlideShare a Scribd company logo
1 of 20
Download to read offline
Streaming
Computing
Some thoughts and technology choices for event-driven processing
Natalino Busa - 29 Aug. 2013
Outline
● Concurrency
● Streaming computing
● Technologies
○ Gigaspaces
○ Storm
○ Akka
● Comparison matrix
● Opportunities
Algorithms: a tribute
Numbers and Algorithms:
9th century Persian Muslim
mathematician Abu Abdullah
Muhammad ibn Musa Al-Khwarizmi,
whose work built upon that of the 7th
century Indian mathematician
Brahmagupta.
We own a lot to these guys !!!
Why do we need parallelism?
It gets bigger,
It doesn’t get much faster
BUT
We get more cores in a chip.
More cores = more parallelism
We are happy now, right?
Moore’s law
Every 18 months, the number of CPU
core’s double
Another
interpretation:
Every 18 months, the number of idle
CPU core’s double
More parallelism
We trade:
Time vs ( CPU, Memory, I/O)
Modern applications
Scalability:
Vertical: concurrency
(use all the cores, memory and I/O of a given machine)
Horizontal: distribution
(use all the machines in the cluster)
High availability:
Fault tolerance: all levels (local, distributed)
(the terminator effect: you can stop it but can’t kill it )
Streaming applications
Performance:
Efficient use of resources:
CPU and memory, but also OS threads and sockets
Asynchronous:
event driven, reacts on new data
Distributed:
more machines = more performance
the algorithm is partitioned and/or replicated on the cluster
What to increase?
More CPU: It helps when there is
computation involved
More MEMORY: It helps when there is
more state to keep
More I/O: It helps when there are
more messages to transfer
Streaming or batch?
ProcessingData
Natalino Busa - 12 Feb. 2013
Data
source system target systemour system
What differentiate Streaming from Batch?
● Granularity of Data
● Granularity of Processing
Granularity impacts:
Throughput, Latency, and the Cost of the system!
The choice is yours
1000 events/sec (1 KB/event)
running on 100 cores all day long
“Wait a day, then process”
860 M events = 86 GB of data
Latency: 24 hours
Throughput: 1 update/day
BATCH: Hadoop
Latency 1ms
Throughput: 1000 updates/sec
STREAMING: Akka
“Do not wait”
Process the 1KB of data each msec.
“Both are valid options. It depends on the application domain and the
requirements/specs of the target and source systems”
Mapping it to existing applications
Granularity of
Data
256 GB 256 GB
Granularity of
Processing
1 CPU 100 CPU’s
Traditional DB systems Big Data (Hadoop)
Granularity of
Data
1 KB 1 KB
Granularity of
Processing
1 CPU 100 CPU’s
Traditional mail server Web application server
Technologies: Gigaspaces
Technologies: Storm
Topology
Supervising
Scaling
Technologies: Akka
Supervising:
tree of actors
Topology (statics and dynamic actors)
Scaling and
distributed processing
Technology matrix
GranularityofData Granularity of Processing
Small Big
Small Akka Akka
Gigaspaces
Big ? Storm
System end-to-end throughput
High ~ 10’000 events/sec Medium ~100 events/sec Low ~10 events/sec
Akka Storm/ Gigaspaces Scripting languages
Big Data in motion
Both are:
Distributed, fault-tolerant, streaming
- Storm
++ multi-language
-- not user/admin friendly
-- slow supervising
processing elements are jvm’s
ideal when data is coarse grained
- Akka
++ high throughput, fine grained actors
++ dynamic topologies
-- low-level, but high performance
processing elements are small and lightweight
ideal for millions of transactions per second
- Gigaspaces
++ combines memory + application distribution
-- framework api is not very flexible
processing elements are jvms
ideal for all-in-one solution, with little customization
Opportunity: Lambda Architecture
Logic layer
Software as a Service
e.g realt-time predictor
Natalino Busa - 12 Feb. 2013
from http://www.manning.com/marz/
Opportunity: Batch + Streaming
Batch
Computing
Front End Services
In-Memory
Distributed Database
In-memory
Distributed DB’s
Batch
Streaming
HTML5 Client / Responsive Applow-latency
HTTP API services FETCH
(refresh)
Streaming
Computing
Data Warehouses Messaging Busses
PUSH
(SSE, notifications)
Thanks
linkedin:
www.linkedin.com/in/natalinobusa
blog:
www.natalinobusa.com
twitter:
@natalinobusa

More Related Content

What's hot

2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classificationKrish_ver2
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project LifecycleJason Geng
 
Implementation levels of virtualization
Implementation levels of virtualizationImplementation levels of virtualization
Implementation levels of virtualizationGokulnath S
 
CS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question BankCS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question Bankpkaviya
 
Virtualization in Cloud Computing and Machine reference Model
Virtualization in Cloud Computing and Machine reference ModelVirtualization in Cloud Computing and Machine reference Model
Virtualization in Cloud Computing and Machine reference ModelDr Neelesh Jain
 
Virtualization in cloud computing ppt
Virtualization in cloud computing pptVirtualization in cloud computing ppt
Virtualization in cloud computing pptMehul Patel
 
Distributed computing
Distributed computingDistributed computing
Distributed computingshivli0769
 
Evolution of the cloud
Evolution of the cloudEvolution of the cloud
Evolution of the cloudsagaroceanic11
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data miningSlideshare
 
Ecg analysis in the cloud
Ecg analysis in the cloudEcg analysis in the cloud
Ecg analysis in the cloudgaurav jain
 
Cloud service management
Cloud service managementCloud service management
Cloud service managementgaurav jain
 
Cloud Computing and Services | PPT
Cloud Computing and Services | PPTCloud Computing and Services | PPT
Cloud Computing and Services | PPTSeminar Links
 
Distributed dbms architectures
Distributed dbms architecturesDistributed dbms architectures
Distributed dbms architecturesPooja Dixit
 

What's hot (20)

Big data unit i
Big data unit iBig data unit i
Big data unit i
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
Cloud sim
Cloud simCloud sim
Cloud sim
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project Lifecycle
 
Cloud Computing Architecture
Cloud Computing ArchitectureCloud Computing Architecture
Cloud Computing Architecture
 
Implementation levels of virtualization
Implementation levels of virtualizationImplementation levels of virtualization
Implementation levels of virtualization
 
CS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question BankCS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question Bank
 
Virtualization in Cloud Computing and Machine reference Model
Virtualization in Cloud Computing and Machine reference ModelVirtualization in Cloud Computing and Machine reference Model
Virtualization in Cloud Computing and Machine reference Model
 
Birch
BirchBirch
Birch
 
Virtualization in cloud computing ppt
Virtualization in cloud computing pptVirtualization in cloud computing ppt
Virtualization in cloud computing ppt
 
Distributed computing
Distributed computingDistributed computing
Distributed computing
 
BIGDATA ANALYTICS LAB MANUAL final.pdf
BIGDATA  ANALYTICS LAB MANUAL final.pdfBIGDATA  ANALYTICS LAB MANUAL final.pdf
BIGDATA ANALYTICS LAB MANUAL final.pdf
 
Evolution of the cloud
Evolution of the cloudEvolution of the cloud
Evolution of the cloud
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
Ecg analysis in the cloud
Ecg analysis in the cloudEcg analysis in the cloud
Ecg analysis in the cloud
 
Cloud service management
Cloud service managementCloud service management
Cloud service management
 
Cloud Computing and Services | PPT
Cloud Computing and Services | PPTCloud Computing and Services | PPT
Cloud Computing and Services | PPT
 
Distributed dbms architectures
Distributed dbms architecturesDistributed dbms architectures
Distributed dbms architectures
 

Viewers also liked

Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notesMohit Saini
 
Surface Computing
Surface ComputingSurface Computing
Surface Computingshivu1234
 
Surface computing
Surface computingSurface computing
Surface computingChandan Jha
 
Demystifying Stream Processing with Apache Kafka
Demystifying Stream Processing with Apache KafkaDemystifying Stream Processing with Apache Kafka
Demystifying Stream Processing with Apache Kafkaconfluent
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQLRTigger
 
Introduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache KafkaIntroduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache Kafkaconfluent
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHanborq Inc.
 
Surface computer ppt
Surface computer pptSurface computer ppt
Surface computer ppttejalc
 
Surface Computing
Surface ComputingSurface Computing
Surface Computingrandyp311
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL DatabasesDerek Stainer
 

Viewers also liked (13)

Surface Computing & Devices
Surface Computing & DevicesSurface Computing & Devices
Surface Computing & Devices
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
Surface Computing
Surface ComputingSurface Computing
Surface Computing
 
18 Data Streams
18 Data Streams18 Data Streams
18 Data Streams
 
Surface computing
Surface computingSurface computing
Surface computing
 
Demystifying Stream Processing with Apache Kafka
Demystifying Stream Processing with Apache KafkaDemystifying Stream Processing with Apache Kafka
Demystifying Stream Processing with Apache Kafka
 
Big Data
Big DataBig Data
Big Data
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
 
Introduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache KafkaIntroduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache Kafka
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
 
Surface computer ppt
Surface computer pptSurface computer ppt
Surface computer ppt
 
Surface Computing
Surface ComputingSurface Computing
Surface Computing
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 

Similar to Streaming computing: architectures, and tchnologies

Cluster Filesystems and the next 1000 human genomes
Cluster Filesystems and the next 1000 human genomesCluster Filesystems and the next 1000 human genomes
Cluster Filesystems and the next 1000 human genomesGuy Coates
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...BigDataEverywhere
 
HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores inside-BigData.com
 
Scientific Computing With Amazon Web Services
Scientific Computing With Amazon Web ServicesScientific Computing With Amazon Web Services
Scientific Computing With Amazon Web ServicesJamie Kinney
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...EUDAT
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla ClusterScyllaDB
 
Parallel architecture
Parallel architectureParallel architecture
Parallel architectureMr SMAK
 
Stories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresStories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresSpark Summit
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumbergerinside-BigData.com
 
Petascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big AnalyticsPetascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big AnalyticsHeiko Joerg Schick
 
AWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAmazon Web Services
 
The Parallel Computing Revolution Is Only Half Over
The Parallel Computing Revolution Is Only Half OverThe Parallel Computing Revolution Is Only Half Over
The Parallel Computing Revolution Is Only Half Overinside-BigData.com
 
BioPig for scalable analysis of big sequencing data
BioPig for scalable analysis of big sequencing dataBioPig for scalable analysis of big sequencing data
BioPig for scalable analysis of big sequencing dataZhong Wang
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009Ian Foster
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIRyousei Takano
 
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Matej Misik
 
BDI- The Beginning (Big data training in Coimbatore)
BDI- The Beginning (Big data training in Coimbatore)BDI- The Beginning (Big data training in Coimbatore)
BDI- The Beginning (Big data training in Coimbatore)Ashok Rangaswamy
 

Similar to Streaming computing: architectures, and tchnologies (20)

Cluster Filesystems and the next 1000 human genomes
Cluster Filesystems and the next 1000 human genomesCluster Filesystems and the next 1000 human genomes
Cluster Filesystems and the next 1000 human genomes
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
 
HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores 
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Scientific Computing With Amazon Web Services
Scientific Computing With Amazon Web ServicesScientific Computing With Amazon Web Services
Scientific Computing With Amazon Web Services
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla Cluster
 
Parallel architecture
Parallel architectureParallel architecture
Parallel architecture
 
Stories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresStories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi Torres
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
 
Petascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big AnalyticsPetascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big Analytics
 
AWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWS
 
The Parallel Computing Revolution Is Only Half Over
The Parallel Computing Revolution Is Only Half OverThe Parallel Computing Revolution Is Only Half Over
The Parallel Computing Revolution Is Only Half Over
 
BioPig for scalable analysis of big sequencing data
BioPig for scalable analysis of big sequencing dataBioPig for scalable analysis of big sequencing data
BioPig for scalable analysis of big sequencing data
 
Nbvtalkatjntuvizianagaram
NbvtalkatjntuvizianagaramNbvtalkatjntuvizianagaram
Nbvtalkatjntuvizianagaram
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCI
 
supercomputer
supercomputersupercomputer
supercomputer
 
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
 
BDI- The Beginning (Big data training in Coimbatore)
BDI- The Beginning (Big data training in Coimbatore)BDI- The Beginning (Big data training in Coimbatore)
BDI- The Beginning (Big data training in Coimbatore)
 

More from Natalino Busa

Data Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovationData Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovationNatalino Busa
 
Data science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksData science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksNatalino Busa
 
7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networks7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networksNatalino Busa
 
Data science apps: beyond notebooks
Data science apps: beyond notebooksData science apps: beyond notebooks
Data science apps: beyond notebooksNatalino Busa
 
[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditing[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditingNatalino Busa
 
Strata London 16: sightseeing, venues, and friends
Strata  London 16: sightseeing, venues, and friendsStrata  London 16: sightseeing, venues, and friends
Strata London 16: sightseeing, venues, and friendsNatalino Busa
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and CassandraNatalino Busa
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analyticsNatalino Busa
 
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Natalino Busa
 
Streaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and SprayStreaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and SprayNatalino Busa
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.Natalino Busa
 
Big data solutions for advanced marketing analytics
Big data solutions for advanced marketing analyticsBig data solutions for advanced marketing analytics
Big data solutions for advanced marketing analyticsNatalino Busa
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API'sNatalino Busa
 
Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Natalino Busa
 
Big and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analyticsBig and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analyticsNatalino Busa
 
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsBig Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsNatalino Busa
 
Strata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topicsStrata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topicsNatalino Busa
 

More from Natalino Busa (19)

Data Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovationData Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovation
 
Data science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksData science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter Notebooks
 
7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networks7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networks
 
Data science apps: beyond notebooks
Data science apps: beyond notebooksData science apps: beyond notebooks
Data science apps: beyond notebooks
 
[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditing[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditing
 
Strata London 16: sightseeing, venues, and friends
Strata  London 16: sightseeing, venues, and friendsStrata  London 16: sightseeing, venues, and friends
Strata London 16: sightseeing, venues, and friends
 
Data in Action
Data in ActionData in Action
Data in Action
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
 
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
 
Streaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and SprayStreaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and Spray
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
 
Big data solutions for advanced marketing analytics
Big data solutions for advanced marketing analyticsBig data solutions for advanced marketing analytics
Big data solutions for advanced marketing analytics
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API's
 
Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.
 
Big and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analyticsBig and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analytics
 
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsBig Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
 
Strata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topicsStrata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topics
 
Big data landscape
Big data landscapeBig data landscape
Big data landscape
 

Recently uploaded

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Recently uploaded (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

Streaming computing: architectures, and tchnologies

  • 1. Streaming Computing Some thoughts and technology choices for event-driven processing Natalino Busa - 29 Aug. 2013
  • 2. Outline ● Concurrency ● Streaming computing ● Technologies ○ Gigaspaces ○ Storm ○ Akka ● Comparison matrix ● Opportunities
  • 3. Algorithms: a tribute Numbers and Algorithms: 9th century Persian Muslim mathematician Abu Abdullah Muhammad ibn Musa Al-Khwarizmi, whose work built upon that of the 7th century Indian mathematician Brahmagupta. We own a lot to these guys !!!
  • 4. Why do we need parallelism? It gets bigger, It doesn’t get much faster BUT We get more cores in a chip. More cores = more parallelism We are happy now, right?
  • 5. Moore’s law Every 18 months, the number of CPU core’s double Another interpretation: Every 18 months, the number of idle CPU core’s double
  • 6. More parallelism We trade: Time vs ( CPU, Memory, I/O)
  • 7. Modern applications Scalability: Vertical: concurrency (use all the cores, memory and I/O of a given machine) Horizontal: distribution (use all the machines in the cluster) High availability: Fault tolerance: all levels (local, distributed) (the terminator effect: you can stop it but can’t kill it )
  • 8. Streaming applications Performance: Efficient use of resources: CPU and memory, but also OS threads and sockets Asynchronous: event driven, reacts on new data Distributed: more machines = more performance the algorithm is partitioned and/or replicated on the cluster
  • 9. What to increase? More CPU: It helps when there is computation involved More MEMORY: It helps when there is more state to keep More I/O: It helps when there are more messages to transfer
  • 10. Streaming or batch? ProcessingData Natalino Busa - 12 Feb. 2013 Data source system target systemour system What differentiate Streaming from Batch? ● Granularity of Data ● Granularity of Processing Granularity impacts: Throughput, Latency, and the Cost of the system!
  • 11. The choice is yours 1000 events/sec (1 KB/event) running on 100 cores all day long “Wait a day, then process” 860 M events = 86 GB of data Latency: 24 hours Throughput: 1 update/day BATCH: Hadoop Latency 1ms Throughput: 1000 updates/sec STREAMING: Akka “Do not wait” Process the 1KB of data each msec. “Both are valid options. It depends on the application domain and the requirements/specs of the target and source systems”
  • 12. Mapping it to existing applications Granularity of Data 256 GB 256 GB Granularity of Processing 1 CPU 100 CPU’s Traditional DB systems Big Data (Hadoop) Granularity of Data 1 KB 1 KB Granularity of Processing 1 CPU 100 CPU’s Traditional mail server Web application server
  • 15. Technologies: Akka Supervising: tree of actors Topology (statics and dynamic actors) Scaling and distributed processing
  • 16. Technology matrix GranularityofData Granularity of Processing Small Big Small Akka Akka Gigaspaces Big ? Storm System end-to-end throughput High ~ 10’000 events/sec Medium ~100 events/sec Low ~10 events/sec Akka Storm/ Gigaspaces Scripting languages
  • 17. Big Data in motion Both are: Distributed, fault-tolerant, streaming - Storm ++ multi-language -- not user/admin friendly -- slow supervising processing elements are jvm’s ideal when data is coarse grained - Akka ++ high throughput, fine grained actors ++ dynamic topologies -- low-level, but high performance processing elements are small and lightweight ideal for millions of transactions per second - Gigaspaces ++ combines memory + application distribution -- framework api is not very flexible processing elements are jvms ideal for all-in-one solution, with little customization
  • 18. Opportunity: Lambda Architecture Logic layer Software as a Service e.g realt-time predictor Natalino Busa - 12 Feb. 2013 from http://www.manning.com/marz/
  • 19. Opportunity: Batch + Streaming Batch Computing Front End Services In-Memory Distributed Database In-memory Distributed DB’s Batch Streaming HTML5 Client / Responsive Applow-latency HTTP API services FETCH (refresh) Streaming Computing Data Warehouses Messaging Busses PUSH (SSE, notifications)