SlideShare a Scribd company logo
1 of 47
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
EVALUATING STREAMING FRAMEWORK
PERFORMANCE FOR A LARGE-SCALE
AGGREGATION PIPELINE
RON CROCKER
(rcrocker@newrelic.com)
PRINCIPAL ENGINEER & ARCHITECT
INGEST PIPELINE
1
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
4
▪
EVERY MINUTE
requestsaccepts

over 16M
stores

over
analytic

events2M
aggregates

over 800M metrics
3Bqueries

over
data

points
▪
▪
different

services
contains

over 200
maintained/

built by 25+ engineering

teams
▪
2.5more

than
SSD

storage
PETABYTES
Thanks for the pic! https://www.flickr.com/photos/stephenyeargin/7466608166
�����
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
9
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
▪ Double-click to edit▪
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
Goals for evaluating streaming systems
• Understand performance characteristics
• Understand operations characteristics
11
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
How New Relic works…
… the cartoon version
12
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
13
A1 An instance of your application running on a host
A2 Another instance of your application running on another host
An More instances of your application running on more hosts
…
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
14
A1
A2
An
▪
▪
▪
New Relic Agent reports data to New Relic
…
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
15
A1
A2
An
▪
▪
▪
…
▪ Agent Token (≈ account ID, agent ID)

▪ Duration (time-period covered)

▪ Timeslices: Each timeslice contains

▪ Metric name

▪ Metric stats

▪ Count, total time, exclusive time, min,
max, sum of squares
HTTP post to <something>.newrelic.com
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
16
▪
▪
▪
▪ ▪
▪
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
Timeslice
Resolver
Minute
Aggregator
Minute
Writer
Hour
Aggregator
Hour
Writer
aggregated_minute_timeslices_data
resolved_timeslice_data
aggregated_hourly_timeslices_data
raw_timeslice_data
HTTP
termination
Other
Consumers
17
A1
A1
▪ Agent Token (≈ account ID, agent ID)

▪ Duration (time-period covered)

▪ Timeslices: Each timeslice contains

▪ Metric name

▪ Metric stats

▪ Count, total time, exclusive time, min,
max, sum of squares
▪ Account ID
▪ Agent ID
▪ Start time
▪ Duration (time-period covered)

▪ Timeslices: Each timeslice contains

▪ Metric name

▪ Metric stats

▪ Count, total time, exclusive time, min,
max, sum of squares
▪ Account ID

▪ Agent ID

▪ Application Agent IDs
▪ Start time

▪ Duration (time-period covered)

▪ Timeslices: Each timeslice contains

▪ Metric ID
▪ Metric stats

▪ Count, total time, exclusive time, min,
max, sum of squares
▪ Account ID

▪ Agent ID
▪ Timeslices: Each timeslice contains

▪ Metric ID

▪ Start time
▪ Duration (time-period covered)
▪ Metric stats

▪ Count, total time, exclusive time,
min, max, sum of squares
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
The Experiment
18
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
19
Timeslice
Resolver
Minute
Aggregator
Minute
Writer
Hour
Aggregator
Hour
Writer
aggregated_minute_timeslices_data
resolved_timeslice_data
aggregated_hourly_timeslices_data
raw_timeslice_data
HTTP
termination
Other
Consumers
Why Minute Aggregator?
▪No external dependencies
▪ Performance comparisons solely focused on processing

▪ Repeatable
▪ We can compare across technologies without needing to normalize

▪ Important to our business
▪ ProvIDes aggregation across instances of your application

▪ We could have benchmarked something else, like Yahoo benchmark or
word count, but would it have mattered?
20
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
21
Timeslice
Resolver
Minute
Aggregator
Minute
Writer
Hour
Aggregator
Hour
Writer
aggregated_minute_timeslices_data
resolved_timeslice_data
aggregated_hourly_timeslices_data
raw_timeslice_data
HTTP
termination
Other
Consumers
What about Hour Aggregator?
▪ Similar to Minute Aggregator
▪ No external dependencies, Repeatable, Important to the business
▪ Needs to run for several hours to understand
performance
▪ … and I'm that patient

▪ Extra credit: Integrate into stream implementations
22
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
Goals for evaluating streaming systems
• Understand performance characteristics
• Performance at different arrival rates:

• 100%

• 6000%

• To infinity and beyond
• Understand operations characteristics
• No explicit goal
23
Evaluation Framework
24
Datacenter
Staging
Kafka
AWS
Experiment
Kafka
VPC
Baseline
Flink
Spark
Load
driver
AWS Configurations
▪ Kafka + ZK
▪ 3 i2.8xlarge hosts

▪ Baseline
▪ 3 m4.4xlarge hosts

▪ Flink
▪ 4 m4.4xlarge hosts

▪ Spark
▪ EMR - 1 master , 3 workers, all m4.4xlarge
25
AWS
Configuration
i2.8xlarge m4.4xlarge
Cores 32 16
RAM 244GB 64GB
Network
Bandwidth
10Gbps 2Gbps
Experimental Kafka system
▪ Kafka 0.8.2.2
▪ NR fork, includes back ports of some 0.9 features

▪ # partitions: 16
▪ It's possible that this is too few partitions for the Baseline system
26
Load Driver
▪ Generates simple synthetic load based on real traffic
▪ Real traffic = output of Timeslice Resolver
▪ Load generated based on repeated messages
▪ Synthesizing interesting load is challenging:
▪ Un-bundle timeslices

▪ Generate re-bundled with new IDs - Agent, Account and/or Metrics

▪ Repeat as necessary to get to load point
27
Kafka
Baseline system - Our incumbent Minute Aggregator
28
▪
▪ Consume
Aggregate

Agent
Aggregate

Applications
▪ Consume
Aggregate

Agent
Aggregate

Applications
▪
Produce
Construct
Minute
Bundles
▪Kafka ▪
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
29
▪
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
30
Distributions are not friendly…
31
Average # timeslices: 279
Geometric Mean # timeslices: 64
Median # timeslices: 44
Long tail…
▪
Flink configuration
Job Manager
Task Manager
(16 slots)
Task Manager
(16 slots)
Task Manager
(16 slots)
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
33
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
34
AWS EMR
Spark configuration
Master
Slave Slave Slave
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
36
But the Spark Streaming solution generates WRONG results
▪ … because there is no Event Time windowing
▪ … leading to me abandoning Spark Streaming
37
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
Results
38
Results
39
Technology 100 % 500 % 4000 % 6000 % more…
Baseline
Flink
Spark
X Flat throughput with Kafka lag
X Flat throughput 

without Kafka lag
X Wrong answers…
Opportunities to improve the experiment
▪ MORE BANDWIDTH
▪ I don't know the limit of the Flink implementation

▪ Key space domain expansion [All]
▪ Scaling in rate domain only, with the same set of keys

▪ This is too easy on the key-based systems [Flink, Spark]
▪ This may be hard on the baseline system as well
▪ Inclusion of Database sinks [Flink, Spark]
▪ Kafka sinks are still needed for downstream functions
40
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
Thank you
41
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
Extra credit
42
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
43
Timeslice
Resolver
Minute
Aggregator
Minute
Writer
Hour
Aggregator
Hour
Writer
aggregated_minute_timeslices_data
resolved_timeslice_data
aggregated_hourly_timeslices_data
raw_timeslice_data
HTTP
termination
Other
Consumers
▪
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
44
▪
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
45
Confidential ©2008-15 New Relic, Inc. All rights reserved.  
Thank you
46
Ron Crocker - Evaluating Streaming Framework Performance for a Large-Scale Aggregation Pipeline

More Related Content

Viewers also liked

Ted Dunning - Keynote: How Can We Take Flink Forward?
Ted Dunning -  Keynote: How Can We Take Flink Forward?Ted Dunning -  Keynote: How Can We Take Flink Forward?
Ted Dunning - Keynote: How Can We Take Flink Forward?Flink Forward
 
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with FlinkSanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with FlinkFlink Forward
 
Eron Wright - Flink Security Enhancements
Eron Wright - Flink Security EnhancementsEron Wright - Flink Security Enhancements
Eron Wright - Flink Security EnhancementsFlink Forward
 
Aljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache FlinkAljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache FlinkFlink Forward
 
Jamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache FlinkJamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache FlinkFlink Forward
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Flink Forward
 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamFlink Forward
 
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Flink Forward
 
Stephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink EverywhereStephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink EverywhereFlink Forward
 
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...Flink Forward
 
Márton Balassi Streaming ML with Flink-
Márton Balassi Streaming ML with Flink- Márton Balassi Streaming ML with Flink-
Márton Balassi Streaming ML with Flink- Flink Forward
 
Stephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large StateStephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large StateFlink Forward
 
Flink Case Study: Amadeus
Flink Case Study: AmadeusFlink Case Study: Amadeus
Flink Case Study: AmadeusFlink Forward
 
Gábor Horváth - Code Generation in Serializers and Comparators of Apache Flink
Gábor Horváth - Code Generation in Serializers and Comparators of Apache FlinkGábor Horváth - Code Generation in Serializers and Comparators of Apache Flink
Gábor Horváth - Code Generation in Serializers and Comparators of Apache FlinkFlink Forward
 
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Flink Forward
 
Matthias Kricke_Martin Grimmer_Michael Schmeißer - Building a real time Tweet...
Matthias Kricke_Martin Grimmer_Michael Schmeißer - Building a real time Tweet...Matthias Kricke_Martin Grimmer_Michael Schmeißer - Building a real time Tweet...
Matthias Kricke_Martin Grimmer_Michael Schmeißer - Building a real time Tweet...Flink Forward
 
Gyula Fóra - RBEA- Scalable Real-Time Analytics at King
Gyula Fóra - RBEA- Scalable Real-Time Analytics at KingGyula Fóra - RBEA- Scalable Real-Time Analytics at King
Gyula Fóra - RBEA- Scalable Real-Time Analytics at KingFlink Forward
 
Automatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia KalavriAutomatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia KalavriFlink Forward
 
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIsFabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIsFlink Forward
 
Eron Wright - Introducing Flink on Mesos
Eron Wright - Introducing Flink on MesosEron Wright - Introducing Flink on Mesos
Eron Wright - Introducing Flink on MesosFlink Forward
 

Viewers also liked (20)

Ted Dunning - Keynote: How Can We Take Flink Forward?
Ted Dunning -  Keynote: How Can We Take Flink Forward?Ted Dunning -  Keynote: How Can We Take Flink Forward?
Ted Dunning - Keynote: How Can We Take Flink Forward?
 
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with FlinkSanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
 
Eron Wright - Flink Security Enhancements
Eron Wright - Flink Security EnhancementsEron Wright - Flink Security Enhancements
Eron Wright - Flink Security Enhancements
 
Aljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache FlinkAljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache Flink
 
Jamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache FlinkJamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache Flink
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
 
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
 
Stephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink EverywhereStephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink Everywhere
 
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
 
Márton Balassi Streaming ML with Flink-
Márton Balassi Streaming ML with Flink- Márton Balassi Streaming ML with Flink-
Márton Balassi Streaming ML with Flink-
 
Stephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large StateStephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large State
 
Flink Case Study: Amadeus
Flink Case Study: AmadeusFlink Case Study: Amadeus
Flink Case Study: Amadeus
 
Gábor Horváth - Code Generation in Serializers and Comparators of Apache Flink
Gábor Horváth - Code Generation in Serializers and Comparators of Apache FlinkGábor Horváth - Code Generation in Serializers and Comparators of Apache Flink
Gábor Horváth - Code Generation in Serializers and Comparators of Apache Flink
 
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
 
Matthias Kricke_Martin Grimmer_Michael Schmeißer - Building a real time Tweet...
Matthias Kricke_Martin Grimmer_Michael Schmeißer - Building a real time Tweet...Matthias Kricke_Martin Grimmer_Michael Schmeißer - Building a real time Tweet...
Matthias Kricke_Martin Grimmer_Michael Schmeißer - Building a real time Tweet...
 
Gyula Fóra - RBEA- Scalable Real-Time Analytics at King
Gyula Fóra - RBEA- Scalable Real-Time Analytics at KingGyula Fóra - RBEA- Scalable Real-Time Analytics at King
Gyula Fóra - RBEA- Scalable Real-Time Analytics at King
 
Automatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia KalavriAutomatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia Kalavri
 
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIsFabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
 
Eron Wright - Introducing Flink on Mesos
Eron Wright - Introducing Flink on MesosEron Wright - Introducing Flink on Mesos
Eron Wright - Introducing Flink on Mesos
 

More from Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkFlink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 

More from Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 

Recently uploaded

Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxVivek487417
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schscnajjemba
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制vexqp
 

Recently uploaded (20)

Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 

Ron Crocker - Evaluating Streaming Framework Performance for a Large-Scale Aggregation Pipeline

  • 1. Confidential ©2008-15 New Relic, Inc. All rights reserved.   EVALUATING STREAMING FRAMEWORK PERFORMANCE FOR A LARGE-SCALE AGGREGATION PIPELINE RON CROCKER (rcrocker@newrelic.com) PRINCIPAL ENGINEER & ARCHITECT INGEST PIPELINE 1
  • 2. Confidential ©2008-15 New Relic, Inc. All rights reserved.  
  • 3.
  • 4. Confidential ©2008-15 New Relic, Inc. All rights reserved.   4 ▪
  • 6.
  • 7. different
 services contains
 over 200 maintained/
 built by 25+ engineering
 teams ▪ 2.5more
 than SSD
 storage PETABYTES
  • 8. Thanks for the pic! https://www.flickr.com/photos/stephenyeargin/7466608166 �����
  • 9. Confidential ©2008-15 New Relic, Inc. All rights reserved.   9
  • 10. Confidential ©2008-15 New Relic, Inc. All rights reserved.   ▪ Double-click to edit▪
  • 11. Confidential ©2008-15 New Relic, Inc. All rights reserved.   Goals for evaluating streaming systems • Understand performance characteristics • Understand operations characteristics 11
  • 12. Confidential ©2008-15 New Relic, Inc. All rights reserved.   How New Relic works… … the cartoon version 12
  • 13. Confidential ©2008-15 New Relic, Inc. All rights reserved.   13 A1 An instance of your application running on a host A2 Another instance of your application running on another host An More instances of your application running on more hosts …
  • 14. Confidential ©2008-15 New Relic, Inc. All rights reserved.   14 A1 A2 An ▪ ▪ ▪ New Relic Agent reports data to New Relic …
  • 15. Confidential ©2008-15 New Relic, Inc. All rights reserved.   15 A1 A2 An ▪ ▪ ▪ … ▪ Agent Token (≈ account ID, agent ID) ▪ Duration (time-period covered) ▪ Timeslices: Each timeslice contains ▪ Metric name ▪ Metric stats ▪ Count, total time, exclusive time, min, max, sum of squares HTTP post to <something>.newrelic.com
  • 16. Confidential ©2008-15 New Relic, Inc. All rights reserved.   16 ▪ ▪ ▪ ▪ ▪ ▪
  • 17. Confidential ©2008-15 New Relic, Inc. All rights reserved.   Timeslice Resolver Minute Aggregator Minute Writer Hour Aggregator Hour Writer aggregated_minute_timeslices_data resolved_timeslice_data aggregated_hourly_timeslices_data raw_timeslice_data HTTP termination Other Consumers 17 A1 A1 ▪ Agent Token (≈ account ID, agent ID) ▪ Duration (time-period covered) ▪ Timeslices: Each timeslice contains ▪ Metric name ▪ Metric stats ▪ Count, total time, exclusive time, min, max, sum of squares ▪ Account ID ▪ Agent ID ▪ Start time ▪ Duration (time-period covered) ▪ Timeslices: Each timeslice contains ▪ Metric name ▪ Metric stats ▪ Count, total time, exclusive time, min, max, sum of squares ▪ Account ID ▪ Agent ID ▪ Application Agent IDs ▪ Start time ▪ Duration (time-period covered) ▪ Timeslices: Each timeslice contains ▪ Metric ID ▪ Metric stats ▪ Count, total time, exclusive time, min, max, sum of squares ▪ Account ID ▪ Agent ID ▪ Timeslices: Each timeslice contains ▪ Metric ID ▪ Start time ▪ Duration (time-period covered) ▪ Metric stats ▪ Count, total time, exclusive time, min, max, sum of squares
  • 18. Confidential ©2008-15 New Relic, Inc. All rights reserved.   The Experiment 18
  • 19. Confidential ©2008-15 New Relic, Inc. All rights reserved.   19 Timeslice Resolver Minute Aggregator Minute Writer Hour Aggregator Hour Writer aggregated_minute_timeslices_data resolved_timeslice_data aggregated_hourly_timeslices_data raw_timeslice_data HTTP termination Other Consumers
  • 20. Why Minute Aggregator? ▪No external dependencies ▪ Performance comparisons solely focused on processing ▪ Repeatable ▪ We can compare across technologies without needing to normalize ▪ Important to our business ▪ ProvIDes aggregation across instances of your application ▪ We could have benchmarked something else, like Yahoo benchmark or word count, but would it have mattered? 20
  • 21. Confidential ©2008-15 New Relic, Inc. All rights reserved.   21 Timeslice Resolver Minute Aggregator Minute Writer Hour Aggregator Hour Writer aggregated_minute_timeslices_data resolved_timeslice_data aggregated_hourly_timeslices_data raw_timeslice_data HTTP termination Other Consumers
  • 22. What about Hour Aggregator? ▪ Similar to Minute Aggregator ▪ No external dependencies, Repeatable, Important to the business ▪ Needs to run for several hours to understand performance ▪ … and I'm that patient ▪ Extra credit: Integrate into stream implementations 22
  • 23. Confidential ©2008-15 New Relic, Inc. All rights reserved.   Goals for evaluating streaming systems • Understand performance characteristics • Performance at different arrival rates: • 100% • 6000% • To infinity and beyond • Understand operations characteristics • No explicit goal 23
  • 25. AWS Configurations ▪ Kafka + ZK ▪ 3 i2.8xlarge hosts ▪ Baseline ▪ 3 m4.4xlarge hosts ▪ Flink ▪ 4 m4.4xlarge hosts ▪ Spark ▪ EMR - 1 master , 3 workers, all m4.4xlarge 25 AWS Configuration i2.8xlarge m4.4xlarge Cores 32 16 RAM 244GB 64GB Network Bandwidth 10Gbps 2Gbps
  • 26. Experimental Kafka system ▪ Kafka 0.8.2.2 ▪ NR fork, includes back ports of some 0.9 features ▪ # partitions: 16 ▪ It's possible that this is too few partitions for the Baseline system 26
  • 27. Load Driver ▪ Generates simple synthetic load based on real traffic ▪ Real traffic = output of Timeslice Resolver ▪ Load generated based on repeated messages ▪ Synthesizing interesting load is challenging: ▪ Un-bundle timeslices ▪ Generate re-bundled with new IDs - Agent, Account and/or Metrics ▪ Repeat as necessary to get to load point 27
  • 28. Kafka Baseline system - Our incumbent Minute Aggregator 28 ▪ ▪ Consume Aggregate
 Agent Aggregate
 Applications ▪ Consume Aggregate
 Agent Aggregate
 Applications ▪ Produce Construct Minute Bundles ▪Kafka ▪
  • 29. Confidential ©2008-15 New Relic, Inc. All rights reserved.   29 ▪
  • 30. Confidential ©2008-15 New Relic, Inc. All rights reserved.   30
  • 31. Distributions are not friendly… 31 Average # timeslices: 279 Geometric Mean # timeslices: 64 Median # timeslices: 44 Long tail… ▪
  • 32. Flink configuration Job Manager Task Manager (16 slots) Task Manager (16 slots) Task Manager (16 slots)
  • 33. Confidential ©2008-15 New Relic, Inc. All rights reserved.   33
  • 34. Confidential ©2008-15 New Relic, Inc. All rights reserved.   34
  • 36. Confidential ©2008-15 New Relic, Inc. All rights reserved.   36
  • 37. But the Spark Streaming solution generates WRONG results ▪ … because there is no Event Time windowing ▪ … leading to me abandoning Spark Streaming 37
  • 38. Confidential ©2008-15 New Relic, Inc. All rights reserved.   Results 38
  • 39. Results 39 Technology 100 % 500 % 4000 % 6000 % more… Baseline Flink Spark X Flat throughput with Kafka lag X Flat throughput 
 without Kafka lag X Wrong answers…
  • 40. Opportunities to improve the experiment ▪ MORE BANDWIDTH ▪ I don't know the limit of the Flink implementation ▪ Key space domain expansion [All] ▪ Scaling in rate domain only, with the same set of keys ▪ This is too easy on the key-based systems [Flink, Spark] ▪ This may be hard on the baseline system as well ▪ Inclusion of Database sinks [Flink, Spark] ▪ Kafka sinks are still needed for downstream functions 40
  • 41. Confidential ©2008-15 New Relic, Inc. All rights reserved.   Thank you 41
  • 42. Confidential ©2008-15 New Relic, Inc. All rights reserved.   Extra credit 42
  • 43. Confidential ©2008-15 New Relic, Inc. All rights reserved.   43 Timeslice Resolver Minute Aggregator Minute Writer Hour Aggregator Hour Writer aggregated_minute_timeslices_data resolved_timeslice_data aggregated_hourly_timeslices_data raw_timeslice_data HTTP termination Other Consumers ▪
  • 44. Confidential ©2008-15 New Relic, Inc. All rights reserved.   44 ▪
  • 45. Confidential ©2008-15 New Relic, Inc. All rights reserved.   45
  • 46. Confidential ©2008-15 New Relic, Inc. All rights reserved.   Thank you 46