Scale Up Performance with Intel® Development


Published on

A walk through the scaled up performance from Intel Cluster Studio XE & Intel Parallel Studio XE

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Scale Up Performance with Intel® Development

  1. 1. 1Scale Up Performance with Intel® Development ToolsOverview of Intel® Cluster Studio XE &Intel® Parallel Studio XEJune, 19 2013Mike Lee
  2. 2. 2visionspan from few cores tomany cores withconsistent models,languages, tools, andtechniques2
  3. 3. 3Multicore CPU Multicore CPUIntel® MICarchitecturecoprocessorSourceCompilersLibraries,Parallel Models3
  4. 4. 4Multicore CPU Multicore CPUIntel® MICarchitecturecoprocessorSourceCompilersLibraries,Parallel ModelsGame Changer“Unparalleled productivity… most of this software doesnot run on a GPU” - Robert Harrison, NICS, ORNL“R. Harrison, “Opportunities and Challenges Posed by Exascale Computing- ORNLs Plans and Perspectives”, National Institute of Computational Sciences, Nov 2011”4
  5. 5. 5Intel® Inspector XE,Intel® VTune™ AmplifierXE, Intel® AdvisorIntel® C/C++ and FortranCompilers w/OpenMPIntel® MKL, Intel® Cilk Plus,Intel® TBB, and Intel® IPPIntel® ParallelStudio XE+ Intel® TraceAnalyzer andCollector+ Intel® MPI Library5
  6. 6. 6Intel® Inspector XE,Intel® VTune™ AmplifierXE, Intel® AdvisorIntel® C/C++ andFortran Compilersw/OpenMPIntel® MKL, Intel® CilkPlus, Intel® TBB, andIntel® IPPIntel® ParallelStudio XEIntel® TraceAnalyzer andCollector6
  7. 7. 7• Industry-leading performancefrom advanced compilers• Comprehensive libraries• Parallel programming models• Insightful analysis toolsMore Cores. Wider Vectors. Performance Delivered.Intel® Parallel Studio XE 2013 and Intel® Cluster Studio XE 2013SerialPerformanceScalingPerformanceEfficientlyTask & DataParallelPerformanceDistributedPerformanceMulticore Many-core128 Bits256 Bits512 Bits50+ coresMore CoresWider Vectors
  8. 8. 8Support for Latest IntelProcessors and CoprocessorsIntel® Ivy BridgemicroarchitectureIntel® HaswellmicroarchitectureIntel® Xeon Phi™coprocessorIntel® C++ and FortranCompiler✔AVX✔AVX2, FMA3✔IMCIIntel® TBB library ✔ ✔ ✔Intel® MKL library✔AVX✔AVX2, FMA3✔Intel® MPI library ✔ ✔ ✔Intel® VTune™ AmplifierXE†✔Hardware Events✔Hardware Events✔Hardware EventsIntel® Inspector XE✔Memory & Thread Checks✔Memory & Thread✔Memory & Thread††† Hardware events for new processors added as new processors ship.†† Analysis runs on multicore processors, provides analysis for multicore and many-core processors.
  9. 9. 9A Family of Parallel Programming ModelsDeveloper ChoiceIntel® Cilk™ PlusC/C++ languageextensions to simplifyparallelismOpen sourcedAlso an Intel productIntel® ThreadingBuilding BlocksWidely used C++template library forparallelismOpen sourcedAlso an Intel productDomain-SpecificLibrariesIntel® IntegratedPerformancePrimitivesIntel® Math KernelLibraryEstablished StandardsMessage PassingInterface (MPI)OpenMP*Coarray FortranOpenCL*Research andDevelopmentIntel® ConcurrentCollectionsOffload ExtensionsIntel® SPMD ParallelCompilerChoice of high-performance parallel programming modelsApplicable to Multicore and Many-core ProgrammingDelivered with Intel® Cluster Studio XE
  10. 10. 10Phase Product Feature BenefitBuildIntel® MPI LibraryHigh Performance Message Passing (MPI)Library• Enabling High Performance Scalability,Interconnect Independence, Runtime FabricSelection, and Application Tuning CapabilityIntel®Composer XEC/C++ and Fortran compilers andperformance libraries• Intel® Threading Building Blocks• Intel® Cilk™ Plus• Intel® Integrated Performance Primitives• Intel® Math Kernel Library• Enabling solution to achieve the applicationperformance and scalability benefits of multicoreand forward scale to many-coreVerifyIntel®Inspector XEMemory & threading dynamic analysis forcode qualityStatic Security Analysis for code quality• Increased productivity, code quality, and lowerscost, finds memory, threading , and securitydefects before they happen• Now MPI enabled at every cluster nodeVerify &TuneIntel® TraceAnalyzer & CollectorMPI Performance Profiler for understandingapplication correctness & behavior• Analyze performance of MPI programs andvisualize parallel application behavior andcommunications patterns to identify hotspotsTuneIntel® VTune™Amplifier XEPerformance Profiler for optimizingapplication performance and scalability• Remove guesswork, saves time, makes it easier tofind performance and scalability bottlenecks• Now MPI enabled at every cluster nodeIntel® Cluster Studio XETools to Scale Forward, Scale Faster – for HPC ClustersEmbargoed Until
  11. 11. 11Intel®Composer XE – HPC Compilers & LibrariesGreat Application PerformanceSerial or Parallel ProgrammingScale Forward & FlexibilityTarget Multicore & Manycore Systems on Linux*, Windows*,and OSX*Standards Driven CompilersAcclaimed Fortran and C++ Compilers. Remarkableperformance improvements with just a simple recompileParallel Programming Models & LibrariesIntel® TBB, Intel® Cilk™ Plus, Intel® OpenMP, Intel® CoarrayFortran, Intel® IPP & Intel® MKL
  12. 12. 12Improved Compiler and Library Performance
  13. 13. 13 13Intel® Cilk™ Plus• 3 simple keywords &array notations forparallelism• Support for task and dataparallelism• Semantics similar toserial code• Simple way to parallelizeyour code• Sequentially consistent,low overhead, powerfulsolutionIntel® Threading BuildingBlocks• Parallel algorithms anddata structures• Scalable memory allocationand task scheduling• Synchronization primitives• Rich feature set for generalpurpose parallelism• Available as open source orcommercial licenseLanguage extensions tosimplify task/data parallelismWidely used C++ templatelibrary for task parallelismCompilers&LibrariesIntel® Cilk™ Plus & Intel® Threading Building BlocksComposibilityUtilize appropriate parallelism model in the same applicationwith both Intel® Cilk™ Plus & Intel® Threading Building Blocks.Simplify ParallelismImplement parallelism through open sourced models withsimple language extensions/keywords & template librariesScale Forward & FlexibilityTarget Multicore & Manycore Systems on Linux*, Windows*,and OSX*
  14. 14. 14 14Compilers&LibrariesIntel® OpenMPOpenMP* 4.0 RC1 & TR1Intel® C++ and Fortran Compiler adds support for SIMDextensions and target extensions.16 Years and Counting…Intel supports and advances standards to advance the HPCindustryAvailable Now in Intel® CompilersIntel® Fortran Composer XE 2013 Update 2 (version 13.1)Intel® C++ Composer XE Update 2 (version 13.1)WelcomeOpenMP 4.0!
  15. 15. 15“Fast and accurate state of the art general purposeCFD solvers is the focus at S & I EngineeringSolutions Pvt, Ltd. Scalability and efficiency are keyto us when it comes to our choice and use of MPILibraries. The Intel® MPI Library has enabled us toscale to over 10k cores with high efficiency andperformance.”Nikhil Vijay Shende, Director,S & I Engineering Solutions,Pvt. Ltd.Full Hybrid SupportFinely tuned control over threaded and OpenMP* hybrid regionsfor multicore and manycore systemsSustainable ScalabilityTake advantage of reduced memory overhead and nativefabric support resulting in lower latencies and higherbandwidthOptimized PerformanceAutomatically employ optimized collectives via cluster- andapplication-level tuningIntel® MPI Library – Flexible, Efficient & Scalable
  16. 16. 16Intel® MPI Library – Flexible, Efficient & Scalable
  17. 17. 17“Intel MKL is indispensable for any high-performance user”Prof. Jack Dongarra, Innovative Computing Lab, University of TennesseeFlexible, Scalable and CompatibleStandard APIs for C & Fortran, Compatible with Present &Future Processors/Coprocessors, Compilers, OS’s, linking andthreading models.Vectorized and ThreadedReplace code with one of thousands of highly optimizedfunctions for science, engineering and financial appsComprehensive Math FunctionalityA wealth of threaded and vectorized complex math functions toaccelerate a wide variety of software applications.Intel® Math Kernel Library – Performance Ready to Use
  18. 18. 18Intel® Math Kernel Library – Performance Ready to Use
  19. 19. 19Extensive & Rich LibraryThousands of optimized functions covering frequently usedfundamental algorithms including those for creating digitalmedia, enterprise, data, embedded, communications, andscientific / technical applications.Optimized for PerformanceUsing Intel® Streaming SIMD Extensions (Intel® SSE) andIntel® Advanced Vector Extensions (Intel® AVX) instructionwill perform faster than what an optimized compiler canproduce alone.Engineered to Save TimeA Library of Highly Optimized Algorithmic Building Blocks forMedia and Data ApplicationsIntel® Integrated Performance Primitives – PerformanceReady to Use
  20. 20. 20Intel® Integrated Performance Primitives – PerformanceReady to Use
  21. 21. 21Intel®Advisor XE – Data Driven Threading DesignSimplifies and Speeds Threading DesignBest Results with Parallelism Design Insight and AnalysisEvaluate Return on InvestmentPerformance benefit vs. the cost of transitioning toparallelismSimplifies adding ParallelismShorter learning curve for parallelism by helping to identifyand experiment with parallel opportunitiesStep-by-step Threading GuidanceFrom surveying code, finding the best implementation, tochecking correctness.
  22. 22. 22Intel®Advisor XE – Data Driven Threading DesignAdd Parallelism with Less Effort, Less Risk and More Impact
  23. 23. 23Optimize Serial & Parallel PeformancePremier Performance ProfilerEasyPerformance optimization can be difficult, but theperformance profiling tool you use shouldn’t be.Rich Set of Performance ProfilesCollect a rich set of performance data for hotspots,threading, locks & waits, DirectX*, bandwidth and more.Mine Results & UnderstandGood data is not enough. Powerful analysis lets you sort,filter and visualize results on the timeline and on your source.Intel® VTune™ Amplifier XE - Performance Profiler“Last week, Intel® VTune™Amplifier XE helped us findalmost 3X performanceimprovement. This week ithelped us improve theperformance another 3X.”Claire Cates, Principal Developer,SAS Institute Inc
  24. 24. 24Intel® VTune™ Amplifier XE - Performance ProfilerWhere is my application…Spending Time? Wasting Time? Waiting Too Long?• Focus tuning onfunctions taking time• See call stacks• See time on source• See cache misses on yoursource• See functions sorted by# of cache misses• See locks by wait time• Red/Green for CPUutilization during waitAdvanced Profiling For Scalable Multicore Performance
  25. 25. 25Intel®Inspector XE – Dynamic AnalysisDeliver More Reliable ApplicationsDetect Memory & Threading ErrorsFlexible to Fit WorkflowInspect C, C++, C(#, F#, and Fortran. No special buildsrequired. Inspects all code even without sourceFind Errors Early in Development CycleEasy to use tool for serial and parallel applications enhancesproductivity, cut cost and speed time-to-results.Memory & Threading ErrorsLeaks, corruption, allocation/de-allocation, API mismatches,data races in stack and heap, deadlocks, and thread & syncAPI errors“We struggled for a week with acrash situation, …we ran Intel®Inspector XE and immediately foundthe array out of bounds thatoccurred long before the actualcrash. We could have saved a week!”Mikael Le Guerroué, Senior Codec ArchitectureEngineer, Envivio
  26. 26. 26Intel®Trace Analyzer and CollectorProfile MPI CommunicationsUnderstand MPI Application BehaviorFlexible to Fit WorkflowUse at compile, link or run to capture trace data for yourapplication.Powerful AnalysisFind temporal dependencies in your code: bottlenecks,hotspots, and load balancing issues correctness checkingLow Overhead & Effective VisualizationVisualize and understand parallel application behavior atminimal cost to concentrate on relevant information quickly
  27. 27. 27Learn MoreCopyright © 2012 Intel Corporation. All rights reserved. Intel, the Intel logo, Intel Sponsors of Tomorrow., and the Intel Sponsors of Tomorrow. logo are trademarks of Intel Corporation in the U.S. and other countries.Intel® SoftwareDevelopment 30 Day Trials!Intel® Xeon® Processors &Intel® Xeon Phi™® Cluster Resources – Forums, papers, trainings & labs
  28. 28. 28INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANYINTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMSANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESSFOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTYRIGHT.Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such asSYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of thosefactors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplatedpurchases, including the performance of that product when combined with other products.Copyright © , Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Xeon Phi, Core, VTune, and Cilk are trademarks of Intel Corporation in theU.S. and other countries.Optimization NoticeIntel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intelmicroprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee theavailability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependentoptimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture arereserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specificinstruction sets covered by this notice.Notice revision #20110804Legal Disclaimer & Optimization NoticeCopyright© 2012, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.28New Product
  29. 29. 29Thank you