SlideShare a Scribd company logo
1 of 61
Real Time Analytics
Stream Processing and Beyond
Mahesh Madushanka
(Associate Technical Lead)
Colombo Big Data Meetup
Outline
• Real Time Analytics Overview
• Stream Processing Technologies
• Apache Storm as an ETL Tool
• Cake ETL
• Apache Storm best practices
• Limitations and challenges with stream processing
• Alternatives for stream processing
OUTLINE
Analytics
“Discovery, Interpretation, and Communication of
meaningful patterns in data”
Source
1
Data
Warehous
e
Source
2
Source
n
Data
Lake
ETL
ANALYTICS
Stream Processing
“Analyze data as it is being produced”
Tuple { 1 , ”qqq” , 23 , ”1233” }
Stream - Sequence of tuples
STREAM PROCESSING
• Apache Storm
http://storm.apache.org/
• IBM Streams
http://www-03.ibm.com/software/products/en/ibm-streams
• Tibco-streambase
http://www.tibco.com/products/tibco-streambase
• S4
http://incubator.apache.org/s4/
STREAM PROCESSING
Apache Spark , is it a stream
processing technology?
STREAM PROCESSING
BATCH VS REAL TIME
Apache Storm
Spouts
A spout is a source of streams in a topology.
{Tuple,Tuple,Tuple,....}
APACHE STORM
Bolts
{Tuple,Tuple,Tuple,...
.}
{Tuple,Tuple,Tuple,...
.}
{Tu,Tu,Tu,....}
{ple,ple,ple,....}
{Tuple,Tuple,Tuple,...
.}
All processing in topologies are done in bolts
APACHE STORM
A topology is a graph of spouts and bolts
that are connected with stream groupings
Topology
APACHE STORM
Stream Grouping
A stream grouping defines how that stream should
be partitioned among the bolt's tasks.
APACHE STORM
Stream Grouping
1. Shuffle grouping: Tuples are randomly distributed across the bolt's
tasks
2. Fields grouping: The stream is partitioned by the fields specified in
the grouping.
3. Partial Key grouping: Equivalent to Fields grouping (provides better
utilization of resources when the incoming data is skewed)
4. All grouping: The stream is replicated across all the bolt's tasks.
5. Global grouping: The entire stream goes to a single one of the bolt's
tasks. Specifically, it goes to the task with the lowest id.
6. None grouping: Equivalent to shuffle groupings.
7. Direct grouping: Producer of the tuple decides which task of the
consumer will receive this tuple.
8. Local or shuffle grouping: If the target bolt has one or more tasks in
the same worker process, tuples will be shuffled to just those in-
process tasks
APACHE STORM
Reliability - ACK
Storm guarantees that every spout tuple will be fully
processed by the topology.
Tuple Tuple Tuple Tuple
ACK ACK ACK ACK
APACHE STORM
1 2 3 4
5678
Storm as an ETL Tool
Talend ETL
STORM AS AN ETL TOOL
HDFS
Spout
Filter
Bolt
Mysql
lookup
Aggregate
Data
Mysql
Output
STORM AS AN ETL TOOL
HDFS
Spout
Filter
Bolt
Mysql
lookup
Aggregate
Data
Mysql
Output
100
Tuples/s
10
Tuples/s
30
Tuples/s
40
Tuples/s
50
Tuples/s
STORM AS AN ETL TOOL
HDFS
Spout
Filter
Bolt
Mysql
lookup
Aggregate
Data
Mysql
Output
100
Tuples/s
10
Tuples/s
30
Tuples/s
40
Tuples/s
50
Tuples/s
10 Tuples/s
STORM AS AN ETL TOOL
HDFS
Spout
Mysql
lookup
Aggregate
Data
Mysql
Output
Filter
Bolt
100
Tuples/s
10
Tuples/s
30
Tuples/s
40
Tuples/s
50
Tuples/s
30 Tuples/s
10
Tuples/s
10
Tuples/s
STORM AS AN ETL TOOL
Filter
Bolt
Filter
Bolt
CAKE ETL Framework
● HDFS Spout
● CSV Spout
● Kafka Spout
● …..
Cake ETL - Bolt
Cake ETL -
Spouts
● Loader Bolt (Mysql,Redshift,....)
● Filter Bolt
● Splitter Bolt
● …..
CAKE ETL FRAMEWORK
HDFS Spout
Xml
Type: HDFS Spout
Parallelism : 1
File Path : ……
Columns : {}
Records per Second : 100
CAKE ETL FRAMEWORK
HDFS
Spout
Filter Bolt
Mysql
lookup
Aggregate
Data
Mysql
Output
Filter Bolt
Filter Bolt
xml
Type: HDFS
Spout
Parallelism : 1
<Logic>
xml
Type: Filter
Parallelism : 3
<Logic>
xml
Type: Mysql
lookup
Parallelism : 1
<Logic>
xml
Type:
Aggregate
Parallelism : 1
<Logic>
xml
Type: Mysql
out
Parallelism : 1
<Logic>
CAKE ETL FRAMEWORK
Server 4Server 2
Server 1 Server 3
CAKE ETL FRAMEWORK
NIMBUS
ZOOKEEPER
SUPERVISORSUPERVISORSUPERVISOR SUPERVISOR
ZOOKEEPER
WORKER
EXECUTORS (THREADS)
TASK
ZOOKEEPER
CAKE ETL FRAMEWORK
WORKERWORKERWORKERWORKERWORKER
CAKE ETL FRAMEWORK
CAKE ETL FRAMEWORK
HDFS
Spout
Filter
Bolt Mysql
lookup
Aggregate
Data
Mysql
Output
Filter
Bolt
Filter
Bolt
* Topology Length = 5
CAKE ETL FRAMEWORK
1
1
1 1
1
1
1
Amazon EC2 Instance ( 2 c3.2xlarge)
8 CPU 8M cache 16GB RAM running Centos 6.4_x64 Kernel
#Workers = #CPU
Spout
Filter
MySQL lookup
Aggregate
MySql out
Server 1 Server 2
CAKE ETL FRAMEWORK
HDFS
Spout
Mysql
lookup
Aggregate
Data
Mysql
Output
Filter
BoltTuple ->
CAKE ETL FRAMEWORK
Filter
Bolt
Filter
Bolt
1
1
1 1
1
1
1
Amazon EC2 Instance ( 2 c3.2xlarge)
8 CPU 8M cache 16GB RAM running Centos 6.4_x64 Kernel
#Workers = #CPU
Spout
Filter
MySQL lookup
Aggregate
MySql out
1
2
3
4
Server 1 Server 2
CAKE ETL FRAMEWORK
1
1
1 1
1
1
1
Amazon EC2 Instance ( 2 c3.2xlarge)
8 CPU 8M cache 16GB RAM running Centos 6.4_x64 Kernel
#Workers = #CPU
~ 160 Tuples/s
Spout
Filter
MySQL lookup
Aggregate
MySql out
1
2
3
4
Server 1 Server 2
CAKE ETL FRAMEWORK
Storm best practices
Scenario 1
• Topology Length (#Transformations Steps)
= 30
• Total Spout and Bolt Count
= 30*
• # Executors / Task
= 30 * Parallelism = 1 for all bolts and
spouts
STORM BEST PRACTICES
Scenario 1
2 2
2 2
2 2
2 1
2 2
2 2
2 2
2 1
Amazon EC2 Instance ( 2 c3.2xlarge)
8 CPU 8M cache 16GB RAM running Centos 6.4_x64 Kernel
#Workers = #CPU
40 Tuples/s (Maximum)
Server 1 Server 2
STORM BEST PRACTICES
Scenario 2
• Topology Length (#Transformations Steps)
= 30
• Total Spout and Bolt Count
= 172*
• # Executor's / Task
= 172
* Parallelism =1-10
Objective = 100 Tuples/s
STORM BEST PRACTICES
11 11
11 11
11 11
10 10
11 11
11 11
11 11
10 10
Scenario 2
Amazon EC2 Instance (2 c3.2xlarge)
8 CPU 8M cache 16GB RAM running Centos 6.4_x64 Kernel
#Workers = #CPU
Server 1 Server 2
STORM BEST PRACTICES
Tuples/s
Scenario 3
<6
Amazon EC2 Instance (4 c3.2xlarge)
8 CPU 8M cache 16GB RAM running Centos 6.4_x64 Kernel
#Workers = #CPU
Server 1 Server 2 Server 3 Server 4
STORM BEST PRACTICES
Tuples/s
Scenario 3
6<
Amazon EC2 Instance (4 c3.2xlarge)
8 CPU 8M cache 16GB RAM running Centos 6.4_x64 Kernel
#Workers = #CPU
1
2
3
4
5
Server 1 Server 2 Server 3 Server 4
6
7
8
STORM BEST PRACTICES
Tuples/s
Scenario 4
<6
Amazon EC2 Instance (2 c3.4xlarge)
16 CPU 30GB RAM running Centos 6.4_x64 Kernel
#Workers = #CPU
Server
1
Server 2
STORM BEST PRACTICES
Tuples/s
Scenario 4
<6
Amazon EC2 Instance (2 c3.4xlarge)
16 CPU 30GB RAM running Centos 6.4_x64 Kernel
#Workers = #CPU
1
2
3
4
5
Server 1 Server 2
6
7
8
STORM BEST PRACTICES
Tuples/s
STORM BEST PRACTICES
Scenario 1 Scenario 2
Scenario 3 Scenario 4
100 Tuples/s ?
Our Solution
Amazon EC2 Instance (4 c3.2xlarge)
8 cpu 8M cache 16GB RAM running Centos 6.4_x64 Kernel
8 CPU * 4 Server = 32 CPU 32
Workers
#Workers per Topology < 8 *
#Executors per Worker <10
#Task per Executor = 2
Maximum Bolt/Spout per Topology =
2*10*8=160
OUR SOLUTION
• Topology Length (#Transformations Steps)
= 30
• Total Spout and Bolt Count = 172
156(<160)
• # Task =
156
• # Executors =
156 /2 = 78
• # Workers =
78/10 = 8
100 Tuples/s
OUR SOLUTION
Amazon EC2 Instance (4 c3.2xlarge)
8 cpu 8M cache 16GB RAM running Centos 6.4_x64 Kernel
~100 Tuples/s
Server 1 Server 2 Server 3 Server 4
ETL Topology (8 workers, 4
server)
OUR SOLUTION
92.59 Tuples/s
25 Transformations Steps per Tuple
OUR SOLUTION
Amazon EC2 Instance (4 c3.2xlarge)
8 cpu 8M cache 16GB RAM running Centos 6.4_x64 Kernel
ETL Topology (8 workers, 4 server)
ETL Topology (8 workers, 4 server)
ETL Topology (7 workers, 4 server)
Server 1 Server 2 Server 3 Server 4
OUR SOLUTION
Limitations and Challenges
• Server Cost 4 node(c3.2xlarge) ~
$ 2000.00
• Nimbus and Supervisor Failures
• ACK - Memory Utilization
• 100 % CPU/Memory Utilization
LIMITATIONS AND CHALLENGES
Alternatives for stream processing
Column-oriented DBMS
ALTERNATIVES FOR STREAM PROCESSING
ALTERNATIVES FOR STREAM PROCESSING
● Why column Store: https://mariadb.com/resources/blog/why-columnstore-important
Questions?
mahesh.madushanka@trycake.com
References
● Data Lake - https://martinfowler.com/bliki/DataLake.html
● Batch vs Real Time data processing - http://www.datasciencecentral.com/profiles/blogs/batch-vs-
real-time-data-processing
● Storm Concept : http://storm.apache.org/releases/2.0.0-SNAPSHOT/Concepts.html
● Why column Store: https://mariadb.com/resources/blog/why-columnstore-important
● Spark Streaming : http://sqlstream.com/2015/03/5-reasons-why-spark-streamings-batch-
processing-of-data-streams-is-not-stream-processing/
● http://spark.apache.org/docs/latest/streaming-programming-guide.html
Spark Streaming receives live input data streams and divides the
data into batches, which are then processed by the Spark engine to
generate the final stream of results in batches.

More Related Content

What's hot

Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Robert Evans
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaAndrew Montalenti
 
streamparse and pystorm: simple reliable parallel processing with storm
streamparse and pystorm: simple reliable parallel processing with stormstreamparse and pystorm: simple reliable parallel processing with storm
streamparse and pystorm: simple reliable parallel processing with stormDaniel Blanchard
 
Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in RustInfluxData
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceBrendan Gregg
 
Open Source Systems Performance
Open Source Systems PerformanceOpen Source Systems Performance
Open Source Systems PerformanceBrendan Gregg
 
FreeBSD 2014 Flame Graphs
FreeBSD 2014 Flame GraphsFreeBSD 2014 Flame Graphs
FreeBSD 2014 Flame GraphsBrendan Gregg
 
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernellcplcp1
 
Device-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded SystemsDevice-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded SystemsemBO_Conference
 
Performance is a feature! - London .NET User Group
Performance is a feature! - London .NET User GroupPerformance is a feature! - London .NET User Group
Performance is a feature! - London .NET User GroupMatt Warren
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012Dan Lynn
 
Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsBrendan Gregg
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormDavorin Vukelic
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Brendan Gregg
 
Virtual Bash! A Lunchtime Introduction to Kafka
Virtual Bash! A Lunchtime Introduction to KafkaVirtual Bash! A Lunchtime Introduction to Kafka
Virtual Bash! A Lunchtime Introduction to KafkaJason Bell
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Ontico
 
Operating Systems - A Primer
Operating Systems - A PrimerOperating Systems - A Primer
Operating Systems - A PrimerSaumil Shah
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixBrendan Gregg
 

What's hot (19)

Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and Kafka
 
streamparse and pystorm: simple reliable parallel processing with storm
streamparse and pystorm: simple reliable parallel processing with stormstreamparse and pystorm: simple reliable parallel processing with storm
streamparse and pystorm: simple reliable parallel processing with storm
 
Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in Rust
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
Open Source Systems Performance
Open Source Systems PerformanceOpen Source Systems Performance
Open Source Systems Performance
 
FreeBSD 2014 Flame Graphs
FreeBSD 2014 Flame GraphsFreeBSD 2014 Flame Graphs
FreeBSD 2014 Flame Graphs
 
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernel
 
Device-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded SystemsDevice-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded Systems
 
Performance is a feature! - London .NET User Group
Performance is a feature! - London .NET User GroupPerformance is a feature! - London .NET User Group
Performance is a feature! - London .NET User Group
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012
 
Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame Graphs
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
 
Virtual Bash! A Lunchtime Introduction to Kafka
Virtual Bash! A Lunchtime Introduction to KafkaVirtual Bash! A Lunchtime Introduction to Kafka
Virtual Bash! A Lunchtime Introduction to Kafka
 
Tuning Solr for Logs
Tuning Solr for LogsTuning Solr for Logs
Tuning Solr for Logs
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)
 
Operating Systems - A Primer
Operating Systems - A PrimerOperating Systems - A Primer
Operating Systems - A Primer
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
 

Similar to Real Time Analytics - Stream Processing (Colombo big data meetup 18/05/2017)

Presenta completaoow2013
Presenta completaoow2013Presenta completaoow2013
Presenta completaoow2013Fran Navarro
 
OpenStack Tempest and REST API testing
OpenStack Tempest and REST API testingOpenStack Tempest and REST API testing
OpenStack Tempest and REST API testingopenstackindia
 
Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWRpasalapudi
 
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Spark Summit
 
O13 024-sparc-t5-architecture-1920540
O13 024-sparc-t5-architecture-1920540O13 024-sparc-t5-architecture-1920540
O13 024-sparc-t5-architecture-1920540Ory Chhean
 
Golang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war storyGolang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war storyAerospike
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Altinity Ltd
 
Super scaling singleton inserts
Super scaling singleton insertsSuper scaling singleton inserts
Super scaling singleton insertsChris Adkin
 
How Many Slaves (Ukoug)
How Many Slaves (Ukoug)How Many Slaves (Ukoug)
How Many Slaves (Ukoug)Doug Burns
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachAlexandre Rafalovitch
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Lucidworks
 
GC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconGC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconPeter Lawrey
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course PROIDEA
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupRafal Kwasny
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkDatabricks
 
HeroLympics Eng V03 Henk Vd Valk
HeroLympics  Eng V03 Henk Vd ValkHeroLympics  Eng V03 Henk Vd Valk
HeroLympics Eng V03 Henk Vd Valkhvdvalk
 
Analyze database system using a 3 d method
Analyze database system using a 3 d methodAnalyze database system using a 3 d method
Analyze database system using a 3 d methodAjith Narayanan
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
 
.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/MultitaskingSasha Kravchuk
 
Adventures in RDS Load Testing
Adventures in RDS Load TestingAdventures in RDS Load Testing
Adventures in RDS Load TestingMike Harnish
 

Similar to Real Time Analytics - Stream Processing (Colombo big data meetup 18/05/2017) (20)

Presenta completaoow2013
Presenta completaoow2013Presenta completaoow2013
Presenta completaoow2013
 
OpenStack Tempest and REST API testing
OpenStack Tempest and REST API testingOpenStack Tempest and REST API testing
OpenStack Tempest and REST API testing
 
Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWR
 
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
 
O13 024-sparc-t5-architecture-1920540
O13 024-sparc-t5-architecture-1920540O13 024-sparc-t5-architecture-1920540
O13 024-sparc-t5-architecture-1920540
 
Golang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war storyGolang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war story
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
 
Super scaling singleton inserts
Super scaling singleton insertsSuper scaling singleton inserts
Super scaling singleton inserts
 
How Many Slaves (Ukoug)
How Many Slaves (Ukoug)How Many Slaves (Ukoug)
How Many Slaves (Ukoug)
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
 
GC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconGC free coding in @Java presented @Geecon
GC free coding in @Java presented @Geecon
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache Spark
 
HeroLympics Eng V03 Henk Vd Valk
HeroLympics  Eng V03 Henk Vd ValkHeroLympics  Eng V03 Henk Vd Valk
HeroLympics Eng V03 Henk Vd Valk
 
Analyze database system using a 3 d method
Analyze database system using a 3 d methodAnalyze database system using a 3 d method
Analyze database system using a 3 d method
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/Multitasking
 
Adventures in RDS Load Testing
Adventures in RDS Load TestingAdventures in RDS Load Testing
Adventures in RDS Load Testing
 

Recently uploaded

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 

Real Time Analytics - Stream Processing (Colombo big data meetup 18/05/2017)

Editor's Notes

  1. ADD IT AS A QUESTION
  2. Generally spouts will read tuples from an external source and emit them into the topology // REMOVE ONE IMAGE
  3. Filtering/Aggregation/Join/Transform
  4. Filtering/Aggregation/Join/Transform
  5. Filtering/Aggregation/Join/Transform
  6. It does this by tracking the tree of tuples triggered by every spout tuple and determining when that tree of tuples has been successfully completed.
  7. Filtering/Aggregation/Join/Transform
  8. ETL Developer need to code it and implement it. Its not like dragging and dropping components
  9. Filtering/Aggregation/Join/Transform
  10. Filtering/Aggregation/Join/Transform
  11. Filtering/Aggregation/Join/Transform - MAX VALUE
  12. Filtering/Aggregation/Join/Transform
  13. Filtering/Aggregation/Join/Transform
  14. Filtering/Aggregation/Join/Transform
  15. Nimbus - Cordinations Zookeper - Distributed Cordinations Supervicer - On each node
  16. WORKER and Fonts Nimbus - Cordinations Zookeper - Distributed Cordinations Supervicer - On each node
  17. Spout = 1 Bolt ( 1*3 +1+1+1) = 6 Total = 7 Executors (Assume one task per executor) = 7 Worker Process = 16 Servers = 2
  18. COLORS STORM RANDUM DISTRIBUTION
  19. Filtering/Aggregation/Join/Transform
  20. Remove the zeroes
  21. HOW CAN WE ACHIEVE THIS
  22. Arrow symbols MAX
  23. Arrow Symbols max
  24. Same as previous Netty - Network Communication Delay
  25. Huge drop Lmax - Network Communication Delay
  26. Lmax - Network Communication Delay add Limiting factor in separate slide
  27. Arrow Symbols Slide snapshots – arrow symbols
  28. If you want you can go with 2 Server (4 workers per each)
  29. Mark it as an ETL/ server