SlideShare a Scribd company logo
1 of 61
Real Time Analytics
Stream Processing and Beyond
Mahesh Madushanka
(Associate Technical Lead)
Colombo Big Data Meetup
Outline
• Real Time Analytics Overview
• Stream Processing Technologies
• Apache Storm as an ETL Tool
• Cake ETL
• Apache Storm best practices
• Limitations and challenges with stream processing
• Alternatives for stream processing
OUTLINE
Analytics
“Discovery, Interpretation, and Communication of
meaningful patterns in data”
Source
1
Data
Warehous
e
Source
2
Source
n
Data
Lake
ETL
ANALYTICS
Stream Processing
“Analyze data as it is being produced”
Tuple { 1 , ”qqq” , 23 , ”1233” }
Stream - Sequence of tuples
STREAM PROCESSING
• Apache Storm
http://storm.apache.org/
• IBM Streams
http://www-03.ibm.com/software/products/en/ibm-streams
• Tibco-streambase
http://www.tibco.com/products/tibco-streambase
• S4
http://incubator.apache.org/s4/
STREAM PROCESSING
Apache Spark , is it a stream
processing technology?
STREAM PROCESSING
BATCH VS REAL TIME
Apache Storm
Spouts
A spout is a source of streams in a topology.
{Tuple,Tuple,Tuple,....}
APACHE STORM
Bolts
{Tuple,Tuple,Tuple,...
.}
{Tuple,Tuple,Tuple,...
.}
{Tu,Tu,Tu,....}
{ple,ple,ple,....}
{Tuple,Tuple,Tuple,...
.}
All processing in topologies are done in bolts
APACHE STORM
A topology is a graph of spouts and bolts
that are connected with stream groupings
Topology
APACHE STORM
Stream Grouping
A stream grouping defines how that stream should
be partitioned among the bolt's tasks.
APACHE STORM
Stream Grouping
1. Shuffle grouping: Tuples are randomly distributed across the bolt's
tasks
2. Fields grouping: The stream is partitioned by the fields specified in
the grouping.
3. Partial Key grouping: Equivalent to Fields grouping (provides better
utilization of resources when the incoming data is skewed)
4. All grouping: The stream is replicated across all the bolt's tasks.
5. Global grouping: The entire stream goes to a single one of the bolt's
tasks. Specifically, it goes to the task with the lowest id.
6. None grouping: Equivalent to shuffle groupings.
7. Direct grouping: Producer of the tuple decides which task of the
consumer will receive this tuple.
8. Local or shuffle grouping: If the target bolt has one or more tasks in
the same worker process, tuples will be shuffled to just those in-
process tasks
APACHE STORM
Reliability - ACK
Storm guarantees that every spout tuple will be fully
processed by the topology.
Tuple Tuple Tuple Tuple
ACK ACK ACK ACK
APACHE STORM
1 2 3 4
5678
Storm as an ETL Tool
Talend ETL
STORM AS AN ETL TOOL
HDFS
Spout
Filter
Bolt
Mysql
lookup
Aggregate
Data
Mysql
Output
STORM AS AN ETL TOOL
HDFS
Spout
Filter
Bolt
Mysql
lookup
Aggregate
Data
Mysql
Output
100
Tuples/s
10
Tuples/s
30
Tuples/s
40
Tuples/s
50
Tuples/s
STORM AS AN ETL TOOL
HDFS
Spout
Filter
Bolt
Mysql
lookup
Aggregate
Data
Mysql
Output
100
Tuples/s
10
Tuples/s
30
Tuples/s
40
Tuples/s
50
Tuples/s
10 Tuples/s
STORM AS AN ETL TOOL
HDFS
Spout
Mysql
lookup
Aggregate
Data
Mysql
Output
Filter
Bolt
100
Tuples/s
10
Tuples/s
30
Tuples/s
40
Tuples/s
50
Tuples/s
30 Tuples/s
10
Tuples/s
10
Tuples/s
STORM AS AN ETL TOOL
Filter
Bolt
Filter
Bolt
CAKE ETL Framework
● HDFS Spout
● CSV Spout
● Kafka Spout
● …..
Cake ETL - Bolt
Cake ETL -
Spouts
● Loader Bolt (Mysql,Redshift,....)
● Filter Bolt
● Splitter Bolt
● …..
CAKE ETL FRAMEWORK
HDFS Spout
Xml
Type: HDFS Spout
Parallelism : 1
File Path : ……
Columns : {}
Records per Second : 100
CAKE ETL FRAMEWORK
HDFS
Spout
Filter Bolt
Mysql
lookup
Aggregate
Data
Mysql
Output
Filter Bolt
Filter Bolt
xml
Type: HDFS
Spout
Parallelism : 1
<Logic>
xml
Type: Filter
Parallelism : 3
<Logic>
xml
Type: Mysql
lookup
Parallelism : 1
<Logic>
xml
Type:
Aggregate
Parallelism : 1
<Logic>
xml
Type: Mysql
out
Parallelism : 1
<Logic>
CAKE ETL FRAMEWORK
Server 4Server 2
Server 1 Server 3
CAKE ETL FRAMEWORK
NIMBUS
ZOOKEEPER
SUPERVISORSUPERVISORSUPERVISOR SUPERVISOR
ZOOKEEPER
WORKER
EXECUTORS (THREADS)
TASK
ZOOKEEPER
CAKE ETL FRAMEWORK
WORKERWORKERWORKERWORKERWORKER
CAKE ETL FRAMEWORK
CAKE ETL FRAMEWORK
HDFS
Spout
Filter
Bolt Mysql
lookup
Aggregate
Data
Mysql
Output
Filter
Bolt
Filter
Bolt
* Topology Length = 5
CAKE ETL FRAMEWORK
1
1
1 1
1
1
1
Amazon EC2 Instance ( 2 c3.2xlarge)
8 CPU 8M cache 16GB RAM running Centos 6.4_x64 Kernel
#Workers = #CPU
Spout
Filter
MySQL lookup
Aggregate
MySql out
Server 1 Server 2
CAKE ETL FRAMEWORK
HDFS
Spout
Mysql
lookup
Aggregate
Data
Mysql
Output
Filter
BoltTuple ->
CAKE ETL FRAMEWORK
Filter
Bolt
Filter
Bolt
1
1
1 1
1
1
1
Amazon EC2 Instance ( 2 c3.2xlarge)
8 CPU 8M cache 16GB RAM running Centos 6.4_x64 Kernel
#Workers = #CPU
Spout
Filter
MySQL lookup
Aggregate
MySql out
1
2
3
4
Server 1 Server 2
CAKE ETL FRAMEWORK
1
1
1 1
1
1
1
Amazon EC2 Instance ( 2 c3.2xlarge)
8 CPU 8M cache 16GB RAM running Centos 6.4_x64 Kernel
#Workers = #CPU
~ 160 Tuples/s
Spout
Filter
MySQL lookup
Aggregate
MySql out
1
2
3
4
Server 1 Server 2
CAKE ETL FRAMEWORK
Storm best practices
Scenario 1
• Topology Length (#Transformations Steps)
= 30
• Total Spout and Bolt Count
= 30*
• # Executors / Task
= 30 * Parallelism = 1 for all bolts and
spouts
STORM BEST PRACTICES
Scenario 1
2 2
2 2
2 2
2 1
2 2
2 2
2 2
2 1
Amazon EC2 Instance ( 2 c3.2xlarge)
8 CPU 8M cache 16GB RAM running Centos 6.4_x64 Kernel
#Workers = #CPU
40 Tuples/s (Maximum)
Server 1 Server 2
STORM BEST PRACTICES
Scenario 2
• Topology Length (#Transformations Steps)
= 30
• Total Spout and Bolt Count
= 172*
• # Executor's / Task
= 172
* Parallelism =1-10
Objective = 100 Tuples/s
STORM BEST PRACTICES
11 11
11 11
11 11
10 10
11 11
11 11
11 11
10 10
Scenario 2
Amazon EC2 Instance (2 c3.2xlarge)
8 CPU 8M cache 16GB RAM running Centos 6.4_x64 Kernel
#Workers = #CPU
Server 1 Server 2
STORM BEST PRACTICES
Tuples/s
Scenario 3
<6
Amazon EC2 Instance (4 c3.2xlarge)
8 CPU 8M cache 16GB RAM running Centos 6.4_x64 Kernel
#Workers = #CPU
Server 1 Server 2 Server 3 Server 4
STORM BEST PRACTICES
Tuples/s
Scenario 3
6<
Amazon EC2 Instance (4 c3.2xlarge)
8 CPU 8M cache 16GB RAM running Centos 6.4_x64 Kernel
#Workers = #CPU
1
2
3
4
5
Server 1 Server 2 Server 3 Server 4
6
7
8
STORM BEST PRACTICES
Tuples/s
Scenario 4
<6
Amazon EC2 Instance (2 c3.4xlarge)
16 CPU 30GB RAM running Centos 6.4_x64 Kernel
#Workers = #CPU
Server
1
Server 2
STORM BEST PRACTICES
Tuples/s
Scenario 4
<6
Amazon EC2 Instance (2 c3.4xlarge)
16 CPU 30GB RAM running Centos 6.4_x64 Kernel
#Workers = #CPU
1
2
3
4
5
Server 1 Server 2
6
7
8
STORM BEST PRACTICES
Tuples/s
STORM BEST PRACTICES
Scenario 1 Scenario 2
Scenario 3 Scenario 4
100 Tuples/s ?
Our Solution
Amazon EC2 Instance (4 c3.2xlarge)
8 cpu 8M cache 16GB RAM running Centos 6.4_x64 Kernel
8 CPU * 4 Server = 32 CPU 32
Workers
#Workers per Topology < 8 *
#Executors per Worker <10
#Task per Executor = 2
Maximum Bolt/Spout per Topology =
2*10*8=160
OUR SOLUTION
• Topology Length (#Transformations Steps)
= 30
• Total Spout and Bolt Count = 172
156(<160)
• # Task =
156
• # Executors =
156 /2 = 78
• # Workers =
78/10 = 8
100 Tuples/s
OUR SOLUTION
Amazon EC2 Instance (4 c3.2xlarge)
8 cpu 8M cache 16GB RAM running Centos 6.4_x64 Kernel
~100 Tuples/s
Server 1 Server 2 Server 3 Server 4
ETL Topology (8 workers, 4
server)
OUR SOLUTION
92.59 Tuples/s
25 Transformations Steps per Tuple
OUR SOLUTION
Amazon EC2 Instance (4 c3.2xlarge)
8 cpu 8M cache 16GB RAM running Centos 6.4_x64 Kernel
ETL Topology (8 workers, 4 server)
ETL Topology (8 workers, 4 server)
ETL Topology (7 workers, 4 server)
Server 1 Server 2 Server 3 Server 4
OUR SOLUTION
Limitations and Challenges
• Server Cost 4 node(c3.2xlarge) ~
$ 2000.00
• Nimbus and Supervisor Failures
• ACK - Memory Utilization
• 100 % CPU/Memory Utilization
LIMITATIONS AND CHALLENGES
Alternatives for stream processing
Column-oriented DBMS
ALTERNATIVES FOR STREAM PROCESSING
ALTERNATIVES FOR STREAM PROCESSING
● Why column Store: https://mariadb.com/resources/blog/why-columnstore-important
Questions?
mahesh.madushanka@trycake.com
References
● Data Lake - https://martinfowler.com/bliki/DataLake.html
● Batch vs Real Time data processing - http://www.datasciencecentral.com/profiles/blogs/batch-vs-
real-time-data-processing
● Storm Concept : http://storm.apache.org/releases/2.0.0-SNAPSHOT/Concepts.html
● Why column Store: https://mariadb.com/resources/blog/why-columnstore-important
● Spark Streaming : http://sqlstream.com/2015/03/5-reasons-why-spark-streamings-batch-
processing-of-data-streams-is-not-stream-processing/
● http://spark.apache.org/docs/latest/streaming-programming-guide.html
Spark Streaming receives live input data streams and divides the
data into batches, which are then processed by the Spark engine to
generate the final stream of results in batches.

More Related Content

What's hot

Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Robert Evans
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaAndrew Montalenti
 
streamparse and pystorm: simple reliable parallel processing with storm
streamparse and pystorm: simple reliable parallel processing with stormstreamparse and pystorm: simple reliable parallel processing with storm
streamparse and pystorm: simple reliable parallel processing with stormDaniel Blanchard
 
Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in RustInfluxData
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceBrendan Gregg
 
Open Source Systems Performance
Open Source Systems PerformanceOpen Source Systems Performance
Open Source Systems PerformanceBrendan Gregg
 
FreeBSD 2014 Flame Graphs
FreeBSD 2014 Flame GraphsFreeBSD 2014 Flame Graphs
FreeBSD 2014 Flame GraphsBrendan Gregg
 
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernellcplcp1
 
Device-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded SystemsDevice-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded SystemsemBO_Conference
 
Performance is a feature! - London .NET User Group
Performance is a feature! - London .NET User GroupPerformance is a feature! - London .NET User Group
Performance is a feature! - London .NET User GroupMatt Warren
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012Dan Lynn
 
Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsBrendan Gregg
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormDavorin Vukelic
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Brendan Gregg
 
Virtual Bash! A Lunchtime Introduction to Kafka
Virtual Bash! A Lunchtime Introduction to KafkaVirtual Bash! A Lunchtime Introduction to Kafka
Virtual Bash! A Lunchtime Introduction to KafkaJason Bell
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Ontico
 
Operating Systems - A Primer
Operating Systems - A PrimerOperating Systems - A Primer
Operating Systems - A PrimerSaumil Shah
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixBrendan Gregg
 

What's hot (19)

Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and Kafka
 
streamparse and pystorm: simple reliable parallel processing with storm
streamparse and pystorm: simple reliable parallel processing with stormstreamparse and pystorm: simple reliable parallel processing with storm
streamparse and pystorm: simple reliable parallel processing with storm
 
Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in Rust
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
Open Source Systems Performance
Open Source Systems PerformanceOpen Source Systems Performance
Open Source Systems Performance
 
FreeBSD 2014 Flame Graphs
FreeBSD 2014 Flame GraphsFreeBSD 2014 Flame Graphs
FreeBSD 2014 Flame Graphs
 
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernel
 
Device-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded SystemsDevice-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded Systems
 
Performance is a feature! - London .NET User Group
Performance is a feature! - London .NET User GroupPerformance is a feature! - London .NET User Group
Performance is a feature! - London .NET User Group
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012
 
Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame Graphs
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
 
Virtual Bash! A Lunchtime Introduction to Kafka
Virtual Bash! A Lunchtime Introduction to KafkaVirtual Bash! A Lunchtime Introduction to Kafka
Virtual Bash! A Lunchtime Introduction to Kafka
 
Tuning Solr for Logs
Tuning Solr for LogsTuning Solr for Logs
Tuning Solr for Logs
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)
 
Operating Systems - A Primer
Operating Systems - A PrimerOperating Systems - A Primer
Operating Systems - A Primer
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
 

Similar to Real Time Analytics - Stream Processing (Colombo big data meetup 18/05/2017)

Presenta completaoow2013
Presenta completaoow2013Presenta completaoow2013
Presenta completaoow2013Fran Navarro
 
OpenStack Tempest and REST API testing
OpenStack Tempest and REST API testingOpenStack Tempest and REST API testing
OpenStack Tempest and REST API testingopenstackindia
 
Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWRpasalapudi
 
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Spark Summit
 
O13 024-sparc-t5-architecture-1920540
O13 024-sparc-t5-architecture-1920540O13 024-sparc-t5-architecture-1920540
O13 024-sparc-t5-architecture-1920540Ory Chhean
 
Golang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war storyGolang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war storyAerospike
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Altinity Ltd
 
Super scaling singleton inserts
Super scaling singleton insertsSuper scaling singleton inserts
Super scaling singleton insertsChris Adkin
 
How Many Slaves (Ukoug)
How Many Slaves (Ukoug)How Many Slaves (Ukoug)
How Many Slaves (Ukoug)Doug Burns
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachAlexandre Rafalovitch
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Lucidworks
 
GC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconGC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconPeter Lawrey
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course PROIDEA
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupRafal Kwasny
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkDatabricks
 
HeroLympics Eng V03 Henk Vd Valk
HeroLympics  Eng V03 Henk Vd ValkHeroLympics  Eng V03 Henk Vd Valk
HeroLympics Eng V03 Henk Vd Valkhvdvalk
 
Analyze database system using a 3 d method
Analyze database system using a 3 d methodAnalyze database system using a 3 d method
Analyze database system using a 3 d methodAjith Narayanan
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
 
.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/MultitaskingSasha Kravchuk
 
Adventures in RDS Load Testing
Adventures in RDS Load TestingAdventures in RDS Load Testing
Adventures in RDS Load TestingMike Harnish
 

Similar to Real Time Analytics - Stream Processing (Colombo big data meetup 18/05/2017) (20)

Presenta completaoow2013
Presenta completaoow2013Presenta completaoow2013
Presenta completaoow2013
 
OpenStack Tempest and REST API testing
OpenStack Tempest and REST API testingOpenStack Tempest and REST API testing
OpenStack Tempest and REST API testing
 
Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWR
 
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
 
O13 024-sparc-t5-architecture-1920540
O13 024-sparc-t5-architecture-1920540O13 024-sparc-t5-architecture-1920540
O13 024-sparc-t5-architecture-1920540
 
Golang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war storyGolang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war story
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
 
Super scaling singleton inserts
Super scaling singleton insertsSuper scaling singleton inserts
Super scaling singleton inserts
 
How Many Slaves (Ukoug)
How Many Slaves (Ukoug)How Many Slaves (Ukoug)
How Many Slaves (Ukoug)
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
 
GC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconGC free coding in @Java presented @Geecon
GC free coding in @Java presented @Geecon
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache Spark
 
HeroLympics Eng V03 Henk Vd Valk
HeroLympics  Eng V03 Henk Vd ValkHeroLympics  Eng V03 Henk Vd Valk
HeroLympics Eng V03 Henk Vd Valk
 
Analyze database system using a 3 d method
Analyze database system using a 3 d methodAnalyze database system using a 3 d method
Analyze database system using a 3 d method
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/Multitasking
 
Adventures in RDS Load Testing
Adventures in RDS Load TestingAdventures in RDS Load Testing
Adventures in RDS Load Testing
 

Recently uploaded

一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理cyebo
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictJack Cole
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxStephen266013
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdfvyankatesh1
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Jon Hansen
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsCEPTES Software Inc
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfMichaelSenkow
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxDilipVasan
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理pyhepag
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理pyhepag
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理pyhepag
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Calllward7
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group MeetingAlison Pitt
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp onlinebalibahu1313
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfEmmanuel Dauda
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理pyhepag
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理cyebo
 

Recently uploaded (20)

一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 

Real Time Analytics - Stream Processing (Colombo big data meetup 18/05/2017)

Editor's Notes

  1. ADD IT AS A QUESTION
  2. Generally spouts will read tuples from an external source and emit them into the topology // REMOVE ONE IMAGE
  3. Filtering/Aggregation/Join/Transform
  4. Filtering/Aggregation/Join/Transform
  5. Filtering/Aggregation/Join/Transform
  6. It does this by tracking the tree of tuples triggered by every spout tuple and determining when that tree of tuples has been successfully completed.
  7. Filtering/Aggregation/Join/Transform
  8. ETL Developer need to code it and implement it. Its not like dragging and dropping components
  9. Filtering/Aggregation/Join/Transform
  10. Filtering/Aggregation/Join/Transform
  11. Filtering/Aggregation/Join/Transform - MAX VALUE
  12. Filtering/Aggregation/Join/Transform
  13. Filtering/Aggregation/Join/Transform
  14. Filtering/Aggregation/Join/Transform
  15. Nimbus - Cordinations Zookeper - Distributed Cordinations Supervicer - On each node
  16. WORKER and Fonts Nimbus - Cordinations Zookeper - Distributed Cordinations Supervicer - On each node
  17. Spout = 1 Bolt ( 1*3 +1+1+1) = 6 Total = 7 Executors (Assume one task per executor) = 7 Worker Process = 16 Servers = 2
  18. COLORS STORM RANDUM DISTRIBUTION
  19. Filtering/Aggregation/Join/Transform
  20. Remove the zeroes
  21. HOW CAN WE ACHIEVE THIS
  22. Arrow symbols MAX
  23. Arrow Symbols max
  24. Same as previous Netty - Network Communication Delay
  25. Huge drop Lmax - Network Communication Delay
  26. Lmax - Network Communication Delay add Limiting factor in separate slide
  27. Arrow Symbols Slide snapshots – arrow symbols
  28. If you want you can go with 2 Server (4 workers per each)
  29. Mark it as an ETL/ server