SlideShare a Scribd company logo
1 of 20
Download to read offline
Michael A. Greene
Vice President, Software and Services Group
Intel Corporation
ACCELERATINGAPACHESPARK-BASED
ANALYTICSONINTELARCHITECTURE	
Follow me on Twitter: @greene1of5
INTELSOFTWAREWhen everything computes and connects, Intel software relates.
Cloud and Data CenterConnective FabricConnected Devices
1100110101101010010101010101110001101010011011001010101011000111001010100101010010101
Ecosystem
Enabling
OS
Enabling
Services
Intel Developer Zone
Other names and brands may be claimed as the property of others
Inaworldwhereeverythingcomputes
and connects, Intel® Software relates
“HARDWAREISTHEMEDIUM
THROUGHWHICHAMAZING
SOFTWAREEXPERIENCES	
AREREALIZED”
BETTERTOGETHER	
+	
COMMUNITY	
INTEL
PERFORMANCEPORTALFORAPACHESPARK	
OVERALLPERFORMANCE	
WW19	
489700c8
WW20	
8e3822a0
WW22	
530efe3e
WW23	
90c60692
WW24	
db81b9d8
HiBench Spark-Perf
NormalizedCompletionTime
10%	
8%	
6%	
4%	
2%	
0%	
-2%	
-4%	
-6%	
-8%	
Base=1.2 release
Weekly Performance
Assessment of
Spark Upstream
Other names and brands may be claimed as the property of others
http://01org.github.io/sparkscore/plaf1.html
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may
cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Configuration: Intel® Xeon® E5-2697 v2 with 128 GB RAM running JAVA 1.8.0_25, CDH
5.3.2 and the latest version of Spark upstream for stated timeframes.
Lower is Better
+	
CUSTOMERS	
BETTERTOGETHER	
INTEL
Data Processing
Time Reduced
94%
Computational
Time Reduced
92%
Other names and brands may be claimed as the property of others.
All performance tests were performed and are being reported by Youku Tudou. Please contact Youku Tudou for more information on any performance test reported here.
10X	
MODELSIZE	
4X	
SPEEDUP	
“With Intel optimized, distributed
machine learning on Spark, we
were able to scale our model size
by >10X, while reduce the training
time by ~4x; it helps us to run the
distributed machine learning on
Xeon clusters and provide better
service experience“
Hyton Deng
Director of Data Infrastructure
Other names and brands may be claimed as the property of others.
All performance tests were performed and are being reported by Tencent. Please contact Tencent for more information on any performance test reported here.
ECOSYSTEM 	
BETTERTOGETHER	
+	
INTEL
“It sometimes took weeks to write
new business applications. It was a
cumbersome process involving two
different teams to analyze real time
streaming data.”
Xiaohui, Liao
Senior Development Manager
Other names and brands may be claimed as the property of others
THECHALLENGE	
INFRASTRUCTURETEAM	
 APPLICATIONENGINEERS	
SQL	
+	
ü 
Other names and brands may be claimed as the property of others
Other names and brands may be claimed as the property of others
STRUCTUREDQUERIES	
STREAMPROCESSING
Spark Streaming Streaming SQL
Other names and brands may be claimed as the property of others
48% of developers use SQL
*Source: StackOverflow.com 2015 survey
THESOLUTION	
INFRASTRUCTURETEAM	
 APPLICATIONENGINEERS	
ü 
+	
 SQL	
ü 
Other names and brands may be claimed as the property of others
“With Streaming SQL for Apache
Spark, our Analysts were able to
develop business applications in a
matter of hours – a huge boost to
productivity.”
Xiaohui, Liao
Senior Development Manager
Other names and brands may be claimed as the property of others
JOINUSAND
CONTRIBUTE!	
https://github.com/Intel-bigdata/spark-streamingsql
01101011010100101010101011100011010100110110010101010110001110010101
011010100110110010101010110001110010101001010100101010110010010001100
SparkR
Streaming SQL
Performance Portal
Project Tungsten
Other names and brands may be claimed as the property of others
ACCELERATINGANALYTICS	
Extend • Stabilize • Optimize
BETTERTOGETHER	
+	
 +	
+	
Find out more:
software.intel.com/bigdata
DELIVERINGONTHEPROMISEOFBIGDATA	
CUSTOMERS	
COMMUNITY	
 ECOSYSTEM	
INTEL
© 2015 Intel Corporation
Intel, the Intel logo, Intel. Experience What's Inside, the Intel. Experience What's Inside logo, Intel Inside, the Intel Inside logo, and Xeon are
trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.  
Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or
other countries.
Java is a registered trademark of Oracle and/or its affiliates.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations
and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance
tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other
products.   For more complete information visit http://www.intel.com/performance.  
Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced
web site and confirm whether referenced data are accurate.
Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that
are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations.
Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel.
Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not
specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference
Guides for more information regarding the specific instruction sets covered by this notice.
Follow me on Twitter: @greene1of5

More Related Content

What's hot

Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...
Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...
Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...
Databricks
 
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Anya Bida
 
Data Security at Scale through Spark and Parquet Encryption
Data Security at Scale through Spark and Parquet EncryptionData Security at Scale through Spark and Parquet Encryption
Data Security at Scale through Spark and Parquet Encryption
Databricks
 
Best Practices for Enabling Speculative Execution on Large Scale Platforms
Best Practices for Enabling Speculative Execution on Large Scale PlatformsBest Practices for Enabling Speculative Execution on Large Scale Platforms
Best Practices for Enabling Speculative Execution on Large Scale Platforms
Databricks
 

What's hot (20)

Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
 
Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...
Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...
Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...
 
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
 
Data Security at Scale through Spark and Parquet Encryption
Data Security at Scale through Spark and Parquet EncryptionData Security at Scale through Spark and Parquet Encryption
Data Security at Scale through Spark and Parquet Encryption
 
Best Practices for Enabling Speculative Execution on Large Scale Platforms
Best Practices for Enabling Speculative Execution on Large Scale PlatformsBest Practices for Enabling Speculative Execution on Large Scale Platforms
Best Practices for Enabling Speculative Execution on Large Scale Platforms
 
Apache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierApache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easier
 
Functional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming FrameworksFunctional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming Frameworks
 
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas Geerdink
 
ROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlowROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlow
 
Spark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg SchadSpark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg Schad
 
Supporting Over a Thousand Custom Hive User Defined Functions
Supporting Over a Thousand Custom Hive User Defined FunctionsSupporting Over a Thousand Custom Hive User Defined Functions
Supporting Over a Thousand Custom Hive User Defined Functions
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
 
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
 
High Performance Enterprise Data Processing with Apache Spark with Sandeep Va...
High Performance Enterprise Data Processing with Apache Spark with Sandeep Va...High Performance Enterprise Data Processing with Apache Spark with Sandeep Va...
High Performance Enterprise Data Processing with Apache Spark with Sandeep Va...
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
 
Why is My Stream Processing Job Slow? with Xavier Leaute
Why is My Stream Processing Job Slow? with Xavier LeauteWhy is My Stream Processing Job Slow? with Xavier Leaute
Why is My Stream Processing Job Slow? with Xavier Leaute
 
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis MagdaApache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
 
Improvements to Flink & it's Applications in Alibaba Search
Improvements to Flink & it's Applications in Alibaba SearchImprovements to Flink & it's Applications in Alibaba Search
Improvements to Flink & it's Applications in Alibaba Search
 
Just enough DevOps for Data Scientists (Part II)
Just enough DevOps for Data Scientists (Part II)Just enough DevOps for Data Scientists (Part II)
Just enough DevOps for Data Scientists (Part II)
 

Viewers also liked

Big Data Profiling
Big Data Profiling Big Data Profiling
Big Data Profiling
eXascale Infolab
 

Viewers also liked (20)

Flyby: Improved Dense Matrix Multiplication-(Tom Vacek, Thomson Reuters)
Flyby: Improved Dense Matrix Multiplication-(Tom Vacek, Thomson Reuters)Flyby: Improved Dense Matrix Multiplication-(Tom Vacek, Thomson Reuters)
Flyby: Improved Dense Matrix Multiplication-(Tom Vacek, Thomson Reuters)
 
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
 
A Data Frame Abstraction Layer for SparkR-(Chris Freeman, Alteryx)
A Data Frame Abstraction Layer for SparkR-(Chris Freeman, Alteryx)A Data Frame Abstraction Layer for SparkR-(Chris Freeman, Alteryx)
A Data Frame Abstraction Layer for SparkR-(Chris Freeman, Alteryx)
 
Perspectives on Big Data & Analytics-(Doug Wolfe, CIA)
Perspectives on Big Data & Analytics-(Doug Wolfe, CIA)Perspectives on Big Data & Analytics-(Doug Wolfe, CIA)
Perspectives on Big Data & Analytics-(Doug Wolfe, CIA)
 
Lecture 04 - Granularity in the Data Warehouse
Lecture 04 - Granularity in the Data WarehouseLecture 04 - Granularity in the Data Warehouse
Lecture 04 - Granularity in the Data Warehouse
 
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
 
Regulatory Reporting of Asset Trading Using Apache Spark-(Sudipto Shankar Das...
Regulatory Reporting of Asset Trading Using Apache Spark-(Sudipto Shankar Das...Regulatory Reporting of Asset Trading Using Apache Spark-(Sudipto Shankar Das...
Regulatory Reporting of Asset Trading Using Apache Spark-(Sudipto Shankar Das...
 
GraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLGraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQL
 
Big Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive ComparisonBig Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive Comparison
 
SparkR - Play Spark Using R (20160909 HadoopCon)
SparkR - Play Spark Using R (20160909 HadoopCon)SparkR - Play Spark Using R (20160909 HadoopCon)
SparkR - Play Spark Using R (20160909 HadoopCon)
 
Big Data Profiling
Big Data Profiling Big Data Profiling
Big Data Profiling
 
Data Warehousing Datamining Concepts
Data Warehousing Datamining ConceptsData Warehousing Datamining Concepts
Data Warehousing Datamining Concepts
 
Big Data in Production: Lessons from Running in the Cloud
Big Data in Production: Lessons from Running in the CloudBig Data in Production: Lessons from Running in the Cloud
Big Data in Production: Lessons from Running in the Cloud
 
Huohua: A Distributed Time Series Analysis Framework For Spark
Huohua: A Distributed Time Series Analysis Framework For SparkHuohua: A Distributed Time Series Analysis Framework For Spark
Huohua: A Distributed Time Series Analysis Framework For Spark
 
Recent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced AnalyticsRecent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced Analytics
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlib
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 

Similar to Accelerating Apache Spark-based Analytics on Intel Architecture-(Michael Greene, Intel)

Cloud Technology: Now Entering the Business Process Phase
Cloud Technology: Now Entering the Business Process PhaseCloud Technology: Now Entering the Business Process Phase
Cloud Technology: Now Entering the Business Process Phase
finteligent
 

Similar to Accelerating Apache Spark-based Analytics on Intel Architecture-(Michael Greene, Intel) (20)

Driving Industrial InnovationOn the Path to Exascale
Driving Industrial InnovationOn the Path to ExascaleDriving Industrial InnovationOn the Path to Exascale
Driving Industrial InnovationOn the Path to Exascale
 
Intel xeon-scalable-processors-overview
Intel xeon-scalable-processors-overviewIntel xeon-scalable-processors-overview
Intel xeon-scalable-processors-overview
 
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...
	 Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...	 Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...
 
Accelerating Apache Spark with Intel QuickAssist Technology
Accelerating Apache Spark with Intel QuickAssist TechnologyAccelerating Apache Spark with Intel QuickAssist Technology
Accelerating Apache Spark with Intel QuickAssist Technology
 
Cloud Technology: Now Entering the Business Process Phase
Cloud Technology: Now Entering the Business Process PhaseCloud Technology: Now Entering the Business Process Phase
Cloud Technology: Now Entering the Business Process Phase
 
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase – Big D...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase – Big D...Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase – Big D...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase – Big D...
 
Introduction to container networking in K8s - SDN/NFV London meetup
Introduction to container networking in K8s - SDN/NFV  London meetupIntroduction to container networking in K8s - SDN/NFV  London meetup
Introduction to container networking in K8s - SDN/NFV London meetup
 
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
 
Accelerating AI Adoption with Partners
Accelerating AI Adoption with PartnersAccelerating AI Adoption with Partners
Accelerating AI Adoption with Partners
 
Microsoft Build 2019- Intel AI Workshop
Microsoft Build 2019- Intel AI Workshop Microsoft Build 2019- Intel AI Workshop
Microsoft Build 2019- Intel AI Workshop
 
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
 
AIDC Summit LA- Hands-on Training
AIDC Summit LA- Hands-on Training AIDC Summit LA- Hands-on Training
AIDC Summit LA- Hands-on Training
 
ONS 2018 LA - Intel Tutorial: Cloud Native to NFV - Alon Bernstein, Cisco & K...
ONS 2018 LA - Intel Tutorial: Cloud Native to NFV - Alon Bernstein, Cisco & K...ONS 2018 LA - Intel Tutorial: Cloud Native to NFV - Alon Bernstein, Cisco & K...
ONS 2018 LA - Intel Tutorial: Cloud Native to NFV - Alon Bernstein, Cisco & K...
 
High Performance Computing: The Essential tool for a Knowledge Economy
High Performance Computing: The Essential tool for a Knowledge EconomyHigh Performance Computing: The Essential tool for a Knowledge Economy
High Performance Computing: The Essential tool for a Knowledge Economy
 
E5 Intel Xeon Processor E5 Family Making the Business Case
E5 Intel Xeon Processor E5 Family Making the Business Case E5 Intel Xeon Processor E5 Family Making the Business Case
E5 Intel Xeon Processor E5 Family Making the Business Case
 
Extend HPC Workloads to Amazon EC2 Instances with Intel and Rescale (CMP373-S...
Extend HPC Workloads to Amazon EC2 Instances with Intel and Rescale (CMP373-S...Extend HPC Workloads to Amazon EC2 Instances with Intel and Rescale (CMP373-S...
Extend HPC Workloads to Amazon EC2 Instances with Intel and Rescale (CMP373-S...
 
DUG'20: 01 - Welcome & DAOS Update
DUG'20: 01 - Welcome & DAOS UpdateDUG'20: 01 - Welcome & DAOS Update
DUG'20: 01 - Welcome & DAOS Update
 
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
 
Intel XDK New - Intel Software Day 2013
Intel XDK New - Intel Software Day 2013Intel XDK New - Intel Software Day 2013
Intel XDK New - Intel Software Day 2013
 
Intel® Xeon® processor E7-8800/4800 v3 Application Showcase
Intel® Xeon® processor E7-8800/4800 v3 Application ShowcaseIntel® Xeon® processor E7-8800/4800 v3 Application Showcase
Intel® Xeon® processor E7-8800/4800 v3 Application Showcase
 

More from Spark Summit

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Spark Summit
 

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
 

Recently uploaded

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Accelerating Apache Spark-based Analytics on Intel Architecture-(Michael Greene, Intel)

  • 1. Michael A. Greene Vice President, Software and Services Group Intel Corporation ACCELERATINGAPACHESPARK-BASED ANALYTICSONINTELARCHITECTURE Follow me on Twitter: @greene1of5
  • 2. INTELSOFTWAREWhen everything computes and connects, Intel software relates. Cloud and Data CenterConnective FabricConnected Devices 1100110101101010010101010101110001101010011011001010101011000111001010100101010010101 Ecosystem Enabling OS Enabling Services Intel Developer Zone Other names and brands may be claimed as the property of others
  • 3. Inaworldwhereeverythingcomputes and connects, Intel® Software relates “HARDWAREISTHEMEDIUM THROUGHWHICHAMAZING SOFTWAREEXPERIENCES AREREALIZED”
  • 5. PERFORMANCEPORTALFORAPACHESPARK OVERALLPERFORMANCE WW19 489700c8 WW20 8e3822a0 WW22 530efe3e WW23 90c60692 WW24 db81b9d8 HiBench Spark-Perf NormalizedCompletionTime 10% 8% 6% 4% 2% 0% -2% -4% -6% -8% Base=1.2 release Weekly Performance Assessment of Spark Upstream Other names and brands may be claimed as the property of others http://01org.github.io/sparkscore/plaf1.html Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Configuration: Intel® Xeon® E5-2697 v2 with 128 GB RAM running JAVA 1.8.0_25, CDH 5.3.2 and the latest version of Spark upstream for stated timeframes. Lower is Better
  • 7. Data Processing Time Reduced 94% Computational Time Reduced 92% Other names and brands may be claimed as the property of others. All performance tests were performed and are being reported by Youku Tudou. Please contact Youku Tudou for more information on any performance test reported here.
  • 8. 10X MODELSIZE 4X SPEEDUP “With Intel optimized, distributed machine learning on Spark, we were able to scale our model size by >10X, while reduce the training time by ~4x; it helps us to run the distributed machine learning on Xeon clusters and provide better service experience“ Hyton Deng Director of Data Infrastructure Other names and brands may be claimed as the property of others. All performance tests were performed and are being reported by Tencent. Please contact Tencent for more information on any performance test reported here.
  • 10. “It sometimes took weeks to write new business applications. It was a cumbersome process involving two different teams to analyze real time streaming data.” Xiaohui, Liao Senior Development Manager Other names and brands may be claimed as the property of others
  • 11. THECHALLENGE INFRASTRUCTURETEAM APPLICATIONENGINEERS SQL + ü  Other names and brands may be claimed as the property of others
  • 12. Other names and brands may be claimed as the property of others STRUCTUREDQUERIES STREAMPROCESSING
  • 13. Spark Streaming Streaming SQL Other names and brands may be claimed as the property of others 48% of developers use SQL *Source: StackOverflow.com 2015 survey
  • 14. THESOLUTION INFRASTRUCTURETEAM APPLICATIONENGINEERS ü  + SQL ü  Other names and brands may be claimed as the property of others
  • 15. “With Streaming SQL for Apache Spark, our Analysts were able to develop business applications in a matter of hours – a huge boost to productivity.” Xiaohui, Liao Senior Development Manager Other names and brands may be claimed as the property of others
  • 17. 01101011010100101010101011100011010100110110010101010110001110010101 011010100110110010101010110001110010101001010100101010110010010001100 SparkR Streaming SQL Performance Portal Project Tungsten Other names and brands may be claimed as the property of others ACCELERATINGANALYTICS Extend • Stabilize • Optimize
  • 18. BETTERTOGETHER + + + Find out more: software.intel.com/bigdata DELIVERINGONTHEPROMISEOFBIGDATA CUSTOMERS COMMUNITY ECOSYSTEM INTEL
  • 19. © 2015 Intel Corporation Intel, the Intel logo, Intel. Experience What's Inside, the Intel. Experience What's Inside logo, Intel Inside, the Intel Inside logo, and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.   Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries. Java is a registered trademark of Oracle and/or its affiliates. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.   For more complete information visit http://www.intel.com/performance.   Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
  • 20. Follow me on Twitter: @greene1of5