SlideShare a Scribd company logo
1 of 30
Download to read offline
Towards Benchmarking Modern Distributed
Streaming Systems
Grace Huang (jie.huang@intel.com)
Jun 15, 2015
Acknowledgement
Dev team
• Shilei, Qian
• Qi, Lv
• Ruirui, Lu
Advisors
• Tathagata Das (Spark PMC member,
committer)
• Saisai Shao
• Sean Zhong (Storm PMC member,
committer)
• Tianlun Zhang
Closely partnered with large web sites and ISVs on better user experiences
• Key contributions for better customer adoption. E.g., Usability, Scalability and Performance
• More utilities to improve the stability & scalability
• HiBench: The Cross platforms micro-benchmark suite for big data (https://github.com/intel-
hadoop/HiBench)
• HiMeter: the light-weight workflow based big data performance analysis tool
About US
Explore more
@ Intel Booth
 WHY we need benchmarking for streaming system?
 WHAT is StreamingBench?
 HOW to use it for Spark Streaming?
Agenda
4
Time is Gold
5
WHY
Velocity
Value
Variety
Volume
1. Big Data is flooding rather than streaming (FSI,
IOT, …)
2. Dig out more profits from various streams in a
real-time manner
3. Streaming+X is blooming ( W/ analytical query,
graph, machine learning)
Frequent Questions from our Partners
6
WHY
If you cannot measure it, you cannot improve it
---William Thomson
How to select a proper
streaming platform?
Frequent Questions from our Partners
7
WHY
If you cannot measure it, you cannot improve it
---William Thomson
Is the streaming
platform reliable and
fault-tolerant?
How to select a proper
streaming platform?
Frequent Questions from our Partners
8
WHY
If you cannot measure it, you cannot improve it
---William Thomson
Which factors can
impact the streaming
applications’
performance?
Is the streaming
platform reliable and
fault-tolerant?
How to select a proper
streaming platform?
Frequent Questions from our Partners
9
WHY
If you cannot measure it, you cannot improve it
---William Thomson
How to properly select
resources for the
streaming applications?
Which factors can
impact the streaming
applications’
performance?
Is the streaming
platform reliable and
fault-tolerant?
How to select a proper
streaming platform?
A streaming benchmark Utility consists of several micro workloads to
1. Understand various streaming systems
2. Grasp tuning knobs
3. Allocate proper resources
4. Improve streaming platforms further
Let’s meet StreamingBench
10
WHAT
Users
Devs
First Glance of StreamingBench
11
Workload Rational Complexity Input data Seed
Identity Directly emit the input record to output
without any transformation
Low - Stateless
computation
Average record size is 60 bytes
in Text
Sampling Select certain records from the input
streams randomly.
Projections To collect a subset of columns for use in
operations, i.e. a projection is the list of
columns selected.
Grep To search streams for the occurrence of a
string of characters that matches a
specified pattern.
Wordcount Count the word occurrence number thru
the entire stream
Medium – Stateful
computation
DistinctCount Count the event number distinctly thru
entire historical stream
Statistics The descriptive statistics including
historical MIN, MAX and SUM based
on the streaming data
Average record size is 200
bytes in numeric.
WHAT
Architecture anatomy
12
WHAT
Cloud
Streaming
Ops
Broker
Broker
Broker
Broker
Generator
Generator
Generator
Generator
Off-line Data Preparations Streaming Processing on the fly
Streaming
Ops
Streaming
Ops
Streaming
Ops
Score Description
THP Throughput
LTC Latency
TPF Throughput
penalty factor
LPF Latency
penalty factor
AR Availability
ratio
Streaming
Ops
Streaming
Ops
Streaming
Ops
Streaming
Ops
Servers
…
…
…
SeedFiles
Kafka
Quick Guide
Important Statement:
The following part is not the answer to those questions, but just showcasing how to explore the
answer.
Systems Under Test*
14
HOW
Hardware
• 3-node Kafka cluster(48 partitions); 3-node streaming system(Spark Streaming, Storm, Trident and Samza);
• 108 cores, 384G Mem. 36x1T SATA HDDs per working cluster, 10G NIC per node
Software (more details in backup pages)
• Keep default configurations for all the test platforms(not tuned), except changing
1. Memory size, parallelism for each worker
2. Batch size, E.g.,
• In Trident: 500MB fetchSizeBytes
• In Spark Streaming: 1 second batch duration (with limited
spark.streaming.kafka.maxRatePerPartition/spark.streaming.receiver.maxRate)
3. In Storm: TOPOLOGY_MAX_SPOUT_PENDING=1000
4. In Spark Streaming: Kafka direct mode;
1. How to select a proper streaming platform?
15
HOW
Throughputs Evaluation (The higher, the better)
• Spark Streaming(Kafka direct mode):
• ~2.5-3x Trident in “stateless” workloads(Network bound);
• close to Trident in “statefull” workload (low parttion#)*
• Trident:
• about x20 Storm (network bound with 500M fetchSizeByte);
• Storm
• lowest among all for ACK ON for reliability
• Wordcount failed in Strorm/Trident
• Samza:
• Performs 3x of Storm in Identity
• Close to Spark streaming in Wordcount (low partition#)
*See “Which factors can impact the streaming applications’ performance”
*with limited spark.streaming.kafka.maxRatePerPartition
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
1. How to select a proper streaming platform?
16
HOW
Latency Evaluation (The lower, the better)
• Storms: the lowest response time (about 0.08-0.17 seconds)
• Spark Streaming: is controlled* around 1.5 second (batch duration), but with higher THP score.
• Trident: highest latency (about 30 seconds)
• Since it fetches 500MB data in each partition.
• Cutting down fetchSize leads to lower throughput, but better
LTC score.
*with limited spark.streaming.kafka.maxRatePerPartition
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
1. How to select a proper streaming platform?
17
HOW
Latency Evaluation (The lower, the better)
• Storms: the lowest response time (about 0.08-0.17 seconds)
• Spark Streaming: is controlled* around 1.5 second (batch duration), but with higher THP score.
• Trident: highest latency (about 30 seconds)
• Since it fetches 500MB data in each partition.
• Cutting down fetchSize leads to lower throughput, but better
LTC score.
*with limited spark.streaming.kafka.maxRatePerPartition
Key Learning: To find the most suitable platforms by mapping your
requirement to the performance score
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
1. The new experimental Kafka direct mode (1.3); while the old receiver mode is more general
for all kinds of streaming sources.
2. It brings significant throughput improvements in Spark Streaming (WAL OFF)
3. Most of light-weight computation workloads in direct mode saturated 10G network
bandwidth usage (100%).
3. Which factors can impact the streaming applications’
performance? (A)
21
HOW
THP score for various modes for Spark Streaming LTC score for various modes for Spark Streaming
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
3. Which factors can impact the streaming applications’
performance? (B)
• WAL doesn’t bring any performance loss in
receive mode(Identity)
• Growing the partition number increases
total throughput in Wordcount (~1.67x)
22
HOW
THP and LTC scores for Spark Streaming W/ and W/O WAL THP and LTC scores in direct mode W/ different partition number
spark.streaming.receiver.writeAheadLog.enable = true
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
• Kryo serialization brings x1.34 throughput, x0.74 latency in receiver mode; (Identity)
• Since Kafka direct mode is network I/O bound, serialization doesn’t change much.
3. Which factors can impact the streaming applications’
performance? (C)
23
HOW
The total TPH score for various Serde in Identity The total LTC score for various Serde in Identity
spark.serializer=org.apache.spark.serializer.KryoSerializer;
spark.kryo.referenceTracking=false
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
• Kryo serialization brings x1.34 throughput, x0.74 latency in receiver mode; (Identity)
• Since Kafka direct mode is network I/O bound, serialization doesn’t change much.
3. Which factors can impact the streaming applications’
performance? (C)
24
HOW
The total TPH score for various Serde in Identity The total LTC score for various Serde in Identity
Key Learning: To simulate your workloads
and identify more key knobs
spark.serializer=org.apache.spark.serializer.KryoSerializer;
spark.kryo.referenceTracking=false
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
Public version is upcoming soon in
HiBench 5.0 release
http://github.com/intel-hadoop/HiBench
Stay tuned …
KEY TAKEAWAY
27
More platforms
Enhance load generating
Better implementation
…
LISTEN to your voices and LEARN for more
Call for more supports from YOU
28
BACKUP
29
• Disabled ACK in Storm for Identity workload, brings x9.2 performance gain.
• To turn on ACK for better reliability. And Trident turns on ACK by default
3. Which factors can impact the streaming applications’
performance? (D)
30
HOW
The total TPH score for Storm in Identity
To disable ACK by TOPOLOGY_ACKER_EXECUTORS=0
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
Node 3(data generators, i.e., Kafka) + 3(computing nodes) + 1(master)
Dual Processor nodes
CPU Xeon E5-2699 2.3GHz 18 physical cores
Memory 128GB
Hard disks 12 SATA HDDs 1T
NIC 10Gb
Hardware/Software configurations
31
Version Other configurations
OS/kernel version Ubuntu 14.04.2 LTS
3.16.0-30-generic x86_64
HugePage disabled;
Ext4, noatime, nodiratime;
ulimit –n 655360;
JDK version Oracle jdk1.8.0_25
Hadoop version Hadoop-2.3.0-cdh5.1.3
Streaming system configurations(1)
32
Version Other configurations
Storm, Trident 0.9.3 supervisor.slots.ports 6701 6702 6703 6704
nimbus.childopts -Xmx16g
supervisor.childopts -Xmx16g
worker.childopts -Xmx32g
TOPOLOGY_ACKER_EXECUTORS= 0 to disable ACK
TOPOLOGY_MAX_SPOUT_PENDING =1000
Spark 1.4 SNAPSHOT Deployment: standalone mode
SPARK_WORKER_CORES 72
SPARK_WORKER_MEMORY 100G
SPARK_WORKER_INSTANCES 1
SPARK_EXECUTOR_MEMORY 100G
SPARK_LOCAL_DIRS disk1-8
batch size: (changed that for healthy results) spark.streaming.kafka.maxRatePerPartition (direct-mode only)
spark.streaming.receiver.maxRate (receiver-mode only)
spark.serializer=org.apache.spark.serializer.KryoSerializer;
spark.kryo.referenceTracking false
spark.streaming.receiver.writeAheadLog.enable
Streaming system configurations(2)
33
Version Other configurations
Samza 0.8.0 yarn.nodemanager.aux-services: mapreduce_shuffle
yarn.nodemanager.aux-services.mapreduce.shuffle.class: org.apache.hadoop.mapred.ShuffleHandler
yarn.nodemanager.vmem-pmem-ratio: 3.1
yarn.nodemanager.vmem-check-enabled: false
yarn.nodemanager.resource.memory-mb: 102400
yarn.scheduler.maximum-allocation-mb: 10240
yarn.scheduler.minimum-allocation-mb: 2048
yarn.nodemanager.resource.cpu-vcores: 40
Kafka
kafka_2.10-0.8.1
Each broker configuration:
num.network.threads 4
num.id.threads 4
socket.send.buffer.bytes 629145600
socket.receive.buffer.bytes 629145600
socket.request.max.bytes 1048576000
log.dirs disk1-4
num.partitions 4
log.segment.bytes 536870912
log.retention.check.interval.ms 60000
zookeeper.connection.timeout.ms 1000000
replica.lag.max.messages 10000000
• INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL
PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY
WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO
FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S
PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS,
OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR
INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS
SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.
• Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked
"reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The
information here is subject to change without notice. Do not finalize a design with this information.
• The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized
errata are available on request.
• Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
• Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to:
http://www.intel.com/design/literature.htm
• Intel, the Intel logo, Intel Xeon, and Xeon logos are trademarks of Intel Corporation in the U.S. and/or other countries.
• Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families: Go to: Learn About
Intel® Processor Numbers http://www.intel.com/products/processor_number
• All the performance data are collected from our internal testing. Some results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any
difference in system hardware or software design or configuration may affect actual performance.
*Other names and brands may be claimed as the property of others.
Copyright © 2015 Intel Corporation. All rights reserved.
Notices and Disclaimers
34
35

More Related Content

What's hot

Random Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min ShenRandom Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min ShenDatabricks
 
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)Spark Summit
 
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit
 
Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Databricks
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingOh Chan Kwon
 
Intro to Spark development
 Intro to Spark development  Intro to Spark development
Intro to Spark development Spark Summit
 
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengChallenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengDatabricks
 
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...Spark Summit
 
What's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You CareWhat's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You CareDatabricks
 
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Spark Summit
 
Apache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierApache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierDatabricks
 
Analytics at Scale with Apache Spark on AWS with Jonathan Fritz
Analytics at Scale with Apache Spark on AWS with Jonathan FritzAnalytics at Scale with Apache Spark on AWS with Jonathan Fritz
Analytics at Scale with Apache Spark on AWS with Jonathan FritzDatabricks
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingHari Shreedharan
 
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Flink Forward
 
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...Spark Summit
 
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...Databricks
 

What's hot (20)

Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
 
Random Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min ShenRandom Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min Shen
 
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
 
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik Sivashanmugam
 
Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
 
Intro to Spark development
 Intro to Spark development  Intro to Spark development
Intro to Spark development
 
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengChallenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
 
What's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You CareWhat's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You Care
 
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
 
Apache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierApache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easier
 
Analytics at Scale with Apache Spark on AWS with Jonathan Fritz
Analytics at Scale with Apache Spark on AWS with Jonathan FritzAnalytics at Scale with Apache Spark on AWS with Jonathan Fritz
Analytics at Scale with Apache Spark on AWS with Jonathan Fritz
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
 
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...
 
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
 

Viewers also liked

Performance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data PlatformsPerformance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data PlatformsDataWorks Summit/Hadoop Summit
 
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Summit
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkFlink Forward
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark StreamingP. Taylor Goetz
 
Mobius: C# Language Binding For Spark
Mobius: C# Language Binding For SparkMobius: C# Language Binding For Spark
Mobius: C# Language Binding For SparkSpark Summit
 
Spark vs storm
Spark vs stormSpark vs storm
Spark vs stormTrong Ton
 
Facebookのリアルタイム Big Data 処理
Facebookのリアルタイム Big Data 処理Facebookのリアルタイム Big Data 処理
Facebookのリアルタイム Big Data 処理maruyama097
 
リアルタイム処理エンジン Gearpumpの紹介
リアルタイム処理エンジンGearpumpの紹介リアルタイム処理エンジンGearpumpの紹介
リアルタイム処理エンジン Gearpumpの紹介Sotaro Kimura
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Summit
 
Sparkストリーミング検証
Sparkストリーミング検証Sparkストリーミング検証
Sparkストリーミング検証BrainPad Inc.
 
Sparkパフォーマンス検証
Sparkパフォーマンス検証Sparkパフォーマンス検証
Sparkパフォーマンス検証BrainPad Inc.
 
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning GroupMachine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning GroupTill Rohrmann
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsPat Patterson
 
Amazon Elastic MapReduceやSparkを中心とした社内の分析環境事例とTips
Amazon Elastic MapReduceやSparkを中心とした社内の分析環境事例とTipsAmazon Elastic MapReduceやSparkを中心とした社内の分析環境事例とTips
Amazon Elastic MapReduceやSparkを中心とした社内の分析環境事例とTipsyuichi_komatsu
 
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudStreamsets Inc.
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Robert Evans
 
ストリーミングのげんざい
ストリーミングのげんざいストリーミングのげんざい
ストリーミングのげんざいTetsuya Morimoto
 
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...Flink Forward
 

Viewers also liked (20)

Performance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data PlatformsPerformance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data Platforms
 
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of Flink
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
 
Mobius: C# Language Binding For Spark
Mobius: C# Language Binding For SparkMobius: C# Language Binding For Spark
Mobius: C# Language Binding For Spark
 
Spark vs storm
Spark vs stormSpark vs storm
Spark vs storm
 
Facebookのリアルタイム Big Data 処理
Facebookのリアルタイム Big Data 処理Facebookのリアルタイム Big Data 処理
Facebookのリアルタイム Big Data 処理
 
リアルタイム処理エンジン Gearpumpの紹介
リアルタイム処理エンジンGearpumpの紹介リアルタイム処理エンジンGearpumpの紹介
リアルタイム処理エンジン Gearpumpの紹介
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike Freedman
 
Sparkストリーミング検証
Sparkストリーミング検証Sparkストリーミング検証
Sparkストリーミング検証
 
Sparkパフォーマンス検証
Sparkパフォーマンス検証Sparkパフォーマンス検証
Sparkパフォーマンス検証
 
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning GroupMachine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning Group
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSets
 
Amazon Elastic MapReduceやSparkを中心とした社内の分析環境事例とTips
Amazon Elastic MapReduceやSparkを中心とした社内の分析環境事例とTipsAmazon Elastic MapReduceやSparkを中心とした社内の分析環境事例とTips
Amazon Elastic MapReduceやSparkを中心とした社内の分析環境事例とTips
 
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
 
ストリーミングのげんざい
ストリーミングのげんざいストリーミングのげんざい
ストリーミングのげんざい
 
Running Spark in Production
Running Spark in ProductionRunning Spark in Production
Running Spark in Production
 
Apache Spark 1000 nodes NTT DATA
Apache Spark 1000 nodes NTT DATAApache Spark 1000 nodes NTT DATA
Apache Spark 1000 nodes NTT DATA
 
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
 

Similar to Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)

Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureTimothy Spann
 
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...Altinity Ltd
 
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Pulsar summit asia 2021   apache pulsar with mqtt for edge computingPulsar summit asia 2021   apache pulsar with mqtt for edge computing
Pulsar summit asia 2021 apache pulsar with mqtt for edge computingTimothy Spann
 
Disadvantages Of Robotium
Disadvantages Of RobotiumDisadvantages Of Robotium
Disadvantages Of RobotiumSusan Tullis
 
Chicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureChicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureRobert Metzger
 
Flink Streaming Berlin Meetup
Flink Streaming Berlin MeetupFlink Streaming Berlin Meetup
Flink Streaming Berlin MeetupMárton Balassi
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
Stream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream SharingStream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream Sharingconfluent
 
Introduction to NBL
Introduction to NBLIntroduction to NBL
Introduction to NBLFei Ji Siao
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Robert Metzger
 
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022HostedbyConfluent
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at ScaleSean Zhong
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupPaco Nathan
 
Big datadc skyfall_preso_v2
Big datadc skyfall_preso_v2Big datadc skyfall_preso_v2
Big datadc skyfall_preso_v2abramsm
 
Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020Pavel Hardak
 
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020Eren Avşaroğulları
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeTimothy Spann
 
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021StreamNative
 

Similar to Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel) (20)

Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
 
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
 
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Pulsar summit asia 2021   apache pulsar with mqtt for edge computingPulsar summit asia 2021   apache pulsar with mqtt for edge computing
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
 
Disadvantages Of Robotium
Disadvantages Of RobotiumDisadvantages Of Robotium
Disadvantages Of Robotium
 
Chicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureChicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architecture
 
Flink Streaming Berlin Meetup
Flink Streaming Berlin MeetupFlink Streaming Berlin Meetup
Flink Streaming Berlin Meetup
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Stream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream SharingStream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream Sharing
 
Introduction to NBL
Introduction to NBLIntroduction to NBL
Introduction to NBL
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
 
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
Io t data streaming
Io t data streamingIo t data streaming
Io t data streaming
 
Big datadc skyfall_preso_v2
Big datadc skyfall_preso_v2Big datadc skyfall_preso_v2
Big datadc skyfall_preso_v2
 
Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020
 
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
 
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
 

More from Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovSpark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit
 

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Recently uploaded

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...gajnagarg
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 

Recently uploaded (20)

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 

Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)

  • 1. Towards Benchmarking Modern Distributed Streaming Systems Grace Huang (jie.huang@intel.com) Jun 15, 2015
  • 2. Acknowledgement Dev team • Shilei, Qian • Qi, Lv • Ruirui, Lu Advisors • Tathagata Das (Spark PMC member, committer) • Saisai Shao • Sean Zhong (Storm PMC member, committer) • Tianlun Zhang
  • 3. Closely partnered with large web sites and ISVs on better user experiences • Key contributions for better customer adoption. E.g., Usability, Scalability and Performance • More utilities to improve the stability & scalability • HiBench: The Cross platforms micro-benchmark suite for big data (https://github.com/intel- hadoop/HiBench) • HiMeter: the light-weight workflow based big data performance analysis tool About US Explore more @ Intel Booth
  • 4.  WHY we need benchmarking for streaming system?  WHAT is StreamingBench?  HOW to use it for Spark Streaming? Agenda 4
  • 5. Time is Gold 5 WHY Velocity Value Variety Volume 1. Big Data is flooding rather than streaming (FSI, IOT, …) 2. Dig out more profits from various streams in a real-time manner 3. Streaming+X is blooming ( W/ analytical query, graph, machine learning)
  • 6. Frequent Questions from our Partners 6 WHY If you cannot measure it, you cannot improve it ---William Thomson How to select a proper streaming platform?
  • 7. Frequent Questions from our Partners 7 WHY If you cannot measure it, you cannot improve it ---William Thomson Is the streaming platform reliable and fault-tolerant? How to select a proper streaming platform?
  • 8. Frequent Questions from our Partners 8 WHY If you cannot measure it, you cannot improve it ---William Thomson Which factors can impact the streaming applications’ performance? Is the streaming platform reliable and fault-tolerant? How to select a proper streaming platform?
  • 9. Frequent Questions from our Partners 9 WHY If you cannot measure it, you cannot improve it ---William Thomson How to properly select resources for the streaming applications? Which factors can impact the streaming applications’ performance? Is the streaming platform reliable and fault-tolerant? How to select a proper streaming platform?
  • 10. A streaming benchmark Utility consists of several micro workloads to 1. Understand various streaming systems 2. Grasp tuning knobs 3. Allocate proper resources 4. Improve streaming platforms further Let’s meet StreamingBench 10 WHAT Users Devs
  • 11. First Glance of StreamingBench 11 Workload Rational Complexity Input data Seed Identity Directly emit the input record to output without any transformation Low - Stateless computation Average record size is 60 bytes in Text Sampling Select certain records from the input streams randomly. Projections To collect a subset of columns for use in operations, i.e. a projection is the list of columns selected. Grep To search streams for the occurrence of a string of characters that matches a specified pattern. Wordcount Count the word occurrence number thru the entire stream Medium – Stateful computation DistinctCount Count the event number distinctly thru entire historical stream Statistics The descriptive statistics including historical MIN, MAX and SUM based on the streaming data Average record size is 200 bytes in numeric. WHAT
  • 12. Architecture anatomy 12 WHAT Cloud Streaming Ops Broker Broker Broker Broker Generator Generator Generator Generator Off-line Data Preparations Streaming Processing on the fly Streaming Ops Streaming Ops Streaming Ops Score Description THP Throughput LTC Latency TPF Throughput penalty factor LPF Latency penalty factor AR Availability ratio Streaming Ops Streaming Ops Streaming Ops Streaming Ops Servers … … … SeedFiles Kafka
  • 13. Quick Guide Important Statement: The following part is not the answer to those questions, but just showcasing how to explore the answer.
  • 14. Systems Under Test* 14 HOW Hardware • 3-node Kafka cluster(48 partitions); 3-node streaming system(Spark Streaming, Storm, Trident and Samza); • 108 cores, 384G Mem. 36x1T SATA HDDs per working cluster, 10G NIC per node Software (more details in backup pages) • Keep default configurations for all the test platforms(not tuned), except changing 1. Memory size, parallelism for each worker 2. Batch size, E.g., • In Trident: 500MB fetchSizeBytes • In Spark Streaming: 1 second batch duration (with limited spark.streaming.kafka.maxRatePerPartition/spark.streaming.receiver.maxRate) 3. In Storm: TOPOLOGY_MAX_SPOUT_PENDING=1000 4. In Spark Streaming: Kafka direct mode;
  • 15. 1. How to select a proper streaming platform? 15 HOW Throughputs Evaluation (The higher, the better) • Spark Streaming(Kafka direct mode): • ~2.5-3x Trident in “stateless” workloads(Network bound); • close to Trident in “statefull” workload (low parttion#)* • Trident: • about x20 Storm (network bound with 500M fetchSizeByte); • Storm • lowest among all for ACK ON for reliability • Wordcount failed in Strorm/Trident • Samza: • Performs 3x of Storm in Identity • Close to Spark streaming in Wordcount (low partition#) *See “Which factors can impact the streaming applications’ performance” *with limited spark.streaming.kafka.maxRatePerPartition For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 16. 1. How to select a proper streaming platform? 16 HOW Latency Evaluation (The lower, the better) • Storms: the lowest response time (about 0.08-0.17 seconds) • Spark Streaming: is controlled* around 1.5 second (batch duration), but with higher THP score. • Trident: highest latency (about 30 seconds) • Since it fetches 500MB data in each partition. • Cutting down fetchSize leads to lower throughput, but better LTC score. *with limited spark.streaming.kafka.maxRatePerPartition For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 17. 1. How to select a proper streaming platform? 17 HOW Latency Evaluation (The lower, the better) • Storms: the lowest response time (about 0.08-0.17 seconds) • Spark Streaming: is controlled* around 1.5 second (batch duration), but with higher THP score. • Trident: highest latency (about 30 seconds) • Since it fetches 500MB data in each partition. • Cutting down fetchSize leads to lower throughput, but better LTC score. *with limited spark.streaming.kafka.maxRatePerPartition Key Learning: To find the most suitable platforms by mapping your requirement to the performance score For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 18. 1. The new experimental Kafka direct mode (1.3); while the old receiver mode is more general for all kinds of streaming sources. 2. It brings significant throughput improvements in Spark Streaming (WAL OFF) 3. Most of light-weight computation workloads in direct mode saturated 10G network bandwidth usage (100%). 3. Which factors can impact the streaming applications’ performance? (A) 21 HOW THP score for various modes for Spark Streaming LTC score for various modes for Spark Streaming For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 19. 3. Which factors can impact the streaming applications’ performance? (B) • WAL doesn’t bring any performance loss in receive mode(Identity) • Growing the partition number increases total throughput in Wordcount (~1.67x) 22 HOW THP and LTC scores for Spark Streaming W/ and W/O WAL THP and LTC scores in direct mode W/ different partition number spark.streaming.receiver.writeAheadLog.enable = true For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 20. • Kryo serialization brings x1.34 throughput, x0.74 latency in receiver mode; (Identity) • Since Kafka direct mode is network I/O bound, serialization doesn’t change much. 3. Which factors can impact the streaming applications’ performance? (C) 23 HOW The total TPH score for various Serde in Identity The total LTC score for various Serde in Identity spark.serializer=org.apache.spark.serializer.KryoSerializer; spark.kryo.referenceTracking=false For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 21. • Kryo serialization brings x1.34 throughput, x0.74 latency in receiver mode; (Identity) • Since Kafka direct mode is network I/O bound, serialization doesn’t change much. 3. Which factors can impact the streaming applications’ performance? (C) 24 HOW The total TPH score for various Serde in Identity The total LTC score for various Serde in Identity Key Learning: To simulate your workloads and identify more key knobs spark.serializer=org.apache.spark.serializer.KryoSerializer; spark.kryo.referenceTracking=false For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 22. Public version is upcoming soon in HiBench 5.0 release http://github.com/intel-hadoop/HiBench Stay tuned … KEY TAKEAWAY 27
  • 23. More platforms Enhance load generating Better implementation … LISTEN to your voices and LEARN for more Call for more supports from YOU 28
  • 25. • Disabled ACK in Storm for Identity workload, brings x9.2 performance gain. • To turn on ACK for better reliability. And Trident turns on ACK by default 3. Which factors can impact the streaming applications’ performance? (D) 30 HOW The total TPH score for Storm in Identity To disable ACK by TOPOLOGY_ACKER_EXECUTORS=0 For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 26. Node 3(data generators, i.e., Kafka) + 3(computing nodes) + 1(master) Dual Processor nodes CPU Xeon E5-2699 2.3GHz 18 physical cores Memory 128GB Hard disks 12 SATA HDDs 1T NIC 10Gb Hardware/Software configurations 31 Version Other configurations OS/kernel version Ubuntu 14.04.2 LTS 3.16.0-30-generic x86_64 HugePage disabled; Ext4, noatime, nodiratime; ulimit –n 655360; JDK version Oracle jdk1.8.0_25 Hadoop version Hadoop-2.3.0-cdh5.1.3
  • 27. Streaming system configurations(1) 32 Version Other configurations Storm, Trident 0.9.3 supervisor.slots.ports 6701 6702 6703 6704 nimbus.childopts -Xmx16g supervisor.childopts -Xmx16g worker.childopts -Xmx32g TOPOLOGY_ACKER_EXECUTORS= 0 to disable ACK TOPOLOGY_MAX_SPOUT_PENDING =1000 Spark 1.4 SNAPSHOT Deployment: standalone mode SPARK_WORKER_CORES 72 SPARK_WORKER_MEMORY 100G SPARK_WORKER_INSTANCES 1 SPARK_EXECUTOR_MEMORY 100G SPARK_LOCAL_DIRS disk1-8 batch size: (changed that for healthy results) spark.streaming.kafka.maxRatePerPartition (direct-mode only) spark.streaming.receiver.maxRate (receiver-mode only) spark.serializer=org.apache.spark.serializer.KryoSerializer; spark.kryo.referenceTracking false spark.streaming.receiver.writeAheadLog.enable
  • 28. Streaming system configurations(2) 33 Version Other configurations Samza 0.8.0 yarn.nodemanager.aux-services: mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce.shuffle.class: org.apache.hadoop.mapred.ShuffleHandler yarn.nodemanager.vmem-pmem-ratio: 3.1 yarn.nodemanager.vmem-check-enabled: false yarn.nodemanager.resource.memory-mb: 102400 yarn.scheduler.maximum-allocation-mb: 10240 yarn.scheduler.minimum-allocation-mb: 2048 yarn.nodemanager.resource.cpu-vcores: 40 Kafka kafka_2.10-0.8.1 Each broker configuration: num.network.threads 4 num.id.threads 4 socket.send.buffer.bytes 629145600 socket.receive.buffer.bytes 629145600 socket.request.max.bytes 1048576000 log.dirs disk1-4 num.partitions 4 log.segment.bytes 536870912 log.retention.check.interval.ms 60000 zookeeper.connection.timeout.ms 1000000 replica.lag.max.messages 10000000
  • 29. • INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. • Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. • The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. • Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. • Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm • Intel, the Intel logo, Intel Xeon, and Xeon logos are trademarks of Intel Corporation in the U.S. and/or other countries. • Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families: Go to: Learn About Intel® Processor Numbers http://www.intel.com/products/processor_number • All the performance data are collected from our internal testing. Some results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. *Other names and brands may be claimed as the property of others. Copyright © 2015 Intel Corporation. All rights reserved. Notices and Disclaimers 34
  • 30. 35