Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Accelerating Apache Spark-based Analytics on Intel Architecture-(Michael Greene, Intel)

2,219 views

Published on

Presentation at Spark Summit 2015

Published in: Data & Analytics

Accelerating Apache Spark-based Analytics on Intel Architecture-(Michael Greene, Intel)

  1. 1. Michael A. Greene Vice President, Software and Services Group Intel Corporation ACCELERATINGAPACHESPARK-BASED ANALYTICSONINTELARCHITECTURE Follow me on Twitter: @greene1of5
  2. 2. INTELSOFTWAREWhen everything computes and connects, Intel software relates. Cloud and Data CenterConnective FabricConnected Devices 1100110101101010010101010101110001101010011011001010101011000111001010100101010010101 Ecosystem Enabling OS Enabling Services Intel Developer Zone Other names and brands may be claimed as the property of others
  3. 3. Inaworldwhereeverythingcomputes and connects, Intel® Software relates “HARDWAREISTHEMEDIUM THROUGHWHICHAMAZING SOFTWAREEXPERIENCES AREREALIZED”
  4. 4. BETTERTOGETHER + COMMUNITY INTEL
  5. 5. PERFORMANCEPORTALFORAPACHESPARK OVERALLPERFORMANCE WW19 489700c8 WW20 8e3822a0 WW22 530efe3e WW23 90c60692 WW24 db81b9d8 HiBench Spark-Perf NormalizedCompletionTime 10% 8% 6% 4% 2% 0% -2% -4% -6% -8% Base=1.2 release Weekly Performance Assessment of Spark Upstream Other names and brands may be claimed as the property of others http://01org.github.io/sparkscore/plaf1.html Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Configuration: Intel® Xeon® E5-2697 v2 with 128 GB RAM running JAVA 1.8.0_25, CDH 5.3.2 and the latest version of Spark upstream for stated timeframes. Lower is Better
  6. 6. + CUSTOMERS BETTERTOGETHER INTEL
  7. 7. Data Processing Time Reduced 94% Computational Time Reduced 92% Other names and brands may be claimed as the property of others. All performance tests were performed and are being reported by Youku Tudou. Please contact Youku Tudou for more information on any performance test reported here.
  8. 8. 10X MODELSIZE 4X SPEEDUP “With Intel optimized, distributed machine learning on Spark, we were able to scale our model size by >10X, while reduce the training time by ~4x; it helps us to run the distributed machine learning on Xeon clusters and provide better service experience“ Hyton Deng Director of Data Infrastructure Other names and brands may be claimed as the property of others. All performance tests were performed and are being reported by Tencent. Please contact Tencent for more information on any performance test reported here.
  9. 9. ECOSYSTEM BETTERTOGETHER + INTEL
  10. 10. “It sometimes took weeks to write new business applications. It was a cumbersome process involving two different teams to analyze real time streaming data.” Xiaohui, Liao Senior Development Manager Other names and brands may be claimed as the property of others
  11. 11. THECHALLENGE INFRASTRUCTURETEAM APPLICATIONENGINEERS SQL + ü  Other names and brands may be claimed as the property of others
  12. 12. Other names and brands may be claimed as the property of others STRUCTUREDQUERIES STREAMPROCESSING
  13. 13. Spark Streaming Streaming SQL Other names and brands may be claimed as the property of others 48% of developers use SQL *Source: StackOverflow.com 2015 survey
  14. 14. THESOLUTION INFRASTRUCTURETEAM APPLICATIONENGINEERS ü  + SQL ü  Other names and brands may be claimed as the property of others
  15. 15. “With Streaming SQL for Apache Spark, our Analysts were able to develop business applications in a matter of hours – a huge boost to productivity.” Xiaohui, Liao Senior Development Manager Other names and brands may be claimed as the property of others
  16. 16. JOINUSAND CONTRIBUTE! https://github.com/Intel-bigdata/spark-streamingsql
  17. 17. 01101011010100101010101011100011010100110110010101010110001110010101 011010100110110010101010110001110010101001010100101010110010010001100 SparkR Streaming SQL Performance Portal Project Tungsten Other names and brands may be claimed as the property of others ACCELERATINGANALYTICS Extend • Stabilize • Optimize
  18. 18. BETTERTOGETHER + + + Find out more: software.intel.com/bigdata DELIVERINGONTHEPROMISEOFBIGDATA CUSTOMERS COMMUNITY ECOSYSTEM INTEL
  19. 19. © 2015 Intel Corporation Intel, the Intel logo, Intel. Experience What's Inside, the Intel. Experience What's Inside logo, Intel Inside, the Intel Inside logo, and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.   Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries. Java is a registered trademark of Oracle and/or its affiliates. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.   For more complete information visit http://www.intel.com/performance.  

×