© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
a...
Click to edit Master title style
2
Introduction
High-level overview of the Intel® Xeon Phi™ platform:
Hardware and Softwar...
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
a...
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
a...
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
a...
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
a...
Click to edit Master title style
7
Introduction
High-level overview of the Intel® Xeon Phi™ platform:
Hardware and Softwar...
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
a...
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
a...
INTEL CONFIDENTIAL
• Click to edit Master text styles
‒ Second level
 Third level
o Fourth level
 Fifth level
Click to e...
Click to edit Master title style
11
Introduction
High-level overview of the Intel® Xeon Phi™ platform:
Hardware and Softwa...
12
Based on memory access and flops required
• Temporal/spatial locality of data
• Bandwidth Requirement
6 GB/s
Bandwidth
...
INTEL CONFIDENTIAL13
75
171
0
50
100
150
200
STREAM
Triad (GB/s)
330
802
0
200
400
600
800
1000
SMP Linpack
(GF/s)
347
887...
INTEL CONFIDENTIAL
1.00
3.91
4.63
4.81
0.00
1.00
2.00
3.00
4.00
5.00
6.00
2S Intel® Xeon® Processor SMP Linpack DGEMM SGEM...
Click to edit Master title style
15
Introduction
High-level overview of the Intel® Xeon Phi™ platform:
Hardware and Softwa...
INTEL CONFIDENTIAL
• Click to edit Master text styles
‒ Second level
 Third level
o Fourth level
 Fifth level
Click to e...
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
a...
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
a...
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
a...
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
a...
Copyright© 2013, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owne...
Copyright© 2013, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owne...
Click to edit Master title style
23
Introduction
High-level overview of the Intel® Xeon Phi™ platform:
Hardware and Softwa...
INTEL CONFIDENTIAL24
145X
FASTER
0.46
SECONDS
STEP 1.
OPTIMIZE CODE
Parallelize and vectorize
code and continue to run on
...
INTEL CONFIDENTIAL
• Application: Hybrid Monte-Carlo program that
simulates lattice QCD with dynamical Wilson fermions.
It...
INTEL CONFIDENTIAL
• Application: Seismic imaging technique used to obtain
a subsurface depth image from input seismic dat...
INTEL CONFIDENTIAL
• Application: Monte Carlo algorithms are used to
evaluate complex instruments, portfolios, and
investm...
INTEL CONFIDENTIAL
• Application: Weather Research and Forecasting (WRF)
• Status: WRF V3.5 was released 4/18/13
• Code Op...
INTEL CONFIDENTIAL
• Application: Sandia National Laboratories' best
approximation to an unstructured implicit finite
elem...
INTEL CONFIDENTIAL30
DEMONSTRATED PERFORMANCE BENEFITS
Intel® Xeon Phi™ Coprocessor
UP TO
2.23X
Acceleware 8th
Order Isotr...
INTEL CONFIDENTIAL31
DEMONSTRATED PERFORMANCE BENEFITS
Intel® Xeon Phi™ Coprocessor
UP TO
10.75X
Monte Carlo SP3
Finance
U...
32
Introduction
High-level overview of the Intel® Xeon Phi™ platform:
Hardware and Software
Intel Xeon Phi Case Studies
In...
INTEL CONFIDENTIAL
• System: TACC Stampede is a 10 petaflop
supercomputer, one of the largest computing systems
in the wor...
INTEL CONFIDENTIAL
System: Located in Southwest China, it contains
16,000 nodes composing the world's largest (public)
ins...
INTEL CONFIDENTIALOther brands and names are the property of their respective owners.
A Growing Sotware Ecosystem:
Develop...
36
Introduction
High-level overview of the Intel® Xeon Phi™ platform:
Hardware and Software
Intel Xeon Phi Case Studies
In...
Copyright© 2013, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owne...
Copyright© 2013, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owne...
Obrigado.
Copyright© 2013, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owne...
Upcoming SlideShare
Loading in...5
×

Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

1,499

Published on

Palestra ministrada por Leonardo Borges no Intel Software Conference nos dias 6 de Agosto (NCC/UNESP/SP) e 12 de Agosto (COPPE/UFRJ/RJ).

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,499
On Slideshare
0
From Embeds
0
Number of Embeds
26
Actions
Shares
0
Downloads
46
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Introdução ao coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013

  1. 1. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Introduction to the Intel® Xeon Phi™ Coprocessor Leo Borges (leonardo.borges@intel.com) Intel - Software and Services Group iStep-Brazil, August 2013 1
  2. 2. Click to edit Master title style 2 Introduction High-level overview of the Intel® Xeon Phi™ platform: Hardware and Software Intel Xeon Phi Case Studies Intel Xeon Phi Ecosystem Conclusions & References
  3. 3. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Large Scale Clusters for Test & Optimization Tera- Scale Research Leading Performance, Energy Efficient Platform Building Blocks Dedicated, Renowned Applications Expertise Broad Software Tools Portfolio Defined HPC Application Platform Many Integrated Core Architecture Manufacturing Process Technologies Exa-Scale Labs A long term commitment to the HPC market segment 3 Intel in High-Performance Computing
  4. 4. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. HPC Processor Solutions Common Intel Environment Portable code, common tools Xeon® General Purpose Architecture Leadership Per Core Performance FP/core via AVX Multi-Core Performance Intel® Xeon Phi™ Coprocessor Trades a “big” IA core for multiple lower performance IA cores resulting in higher performance for a subset of highly parallel applications EN General purpose perf/watt EP Max perf/watt w/ Higher Memory BW / freq and QPI ideal for HPC Xeon EX Additional sockets & big memory EP 4S Additional compute density Multi-Core Many-Core 4
  5. 5. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 5 Highly parallel and vectorized applications, or with need for higher memory bandwidth, will run even faster on Intel® Xeon Phi™ Coprocessors Most applications will still run best on multi- core Intel® Xeon® processors Optimizing code often delivers significant performance gains RUNNING EXISTING SERIAL SOFTWARE RUNNING OPTIMIZED SOFTWARE Big Gains for Selected Applications Medical imaging and biophysics Computer Aided Design & Manufacturing Climate modeling & weather prediction Financial analyses, trading Energy &oil exploration Digital content creation
  6. 6. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 6 YES Evaluating Your Applications for Intel® Xeon Phi™ NO YES YES YES Can your workload benefit from more memory bandwidth? Can your workload benefit from large vectors? NO NO Can your workload scale to over 100 threads? Use Intel® Xeon Phi™ coprocessors for applications that scale with: • Threads • Vectors • Memory Bandwidth
  7. 7. Click to edit Master title style 7 Introduction High-level overview of the Intel® Xeon Phi™ platform: Hardware and Software Intel Xeon Phi Case Studies Intel Xeon Phi Ecosystem Conclusions & References
  8. 8. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 8 Intel Many Integrated Core (MIC, pronounced “Mike”) Product Family/Architecture for Highly Parallel Applications • Based on large number of smaller, low power, Intel Arch. Cores • 512-bit wide vector engine • Compliments Intel Xeon processor product line • Provides breakthrough performance for highly parallel apps – Familiar x86 programming model – Same source code supports both Intel Xeon processor & Intel Xeon Phi coprocessor – Initially a coprocessor with PCI Express form factor First products announced at SC12: Code named Knights Corner (KNC) • Up to 61 cores, 4 threads per core • Up to 16GB GDDR5 memory (up to 352 GB/s) • 225-300W (Cooling: Both passive & active SKUs) • x16 PCIe Form-Factor (requires IA host) 8 Intel® Xeon® Phi™ Product Family Based on the Intel MIC Architecture
  9. 9. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 9 Each Xeon Phi can be addressed as an Individual Node in the Cluster • 9 6 to 16 GB GDDR5 memory
  10. 10. INTEL CONFIDENTIAL • Click to edit Master text styles ‒ Second level  Third level o Fourth level  Fifth level Click to edit Master title style 10 © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 3 Family Outstanding Parallel Computing Solution Performance/$ leadership Intel® Xeon Phi™ Coprocessors 3120P 3120A 5 Family Optimized for High Density Environments Performance/watt leadership 5120D 7 Family Highest Level of Features Performance leadership 7120P 7120X 16GB GDDR5 352 GB/s > 1.2 TFlops DP Turbo T 8GB GDDR5 >300 GB/s >1 TFlops DP 6GB GDDR5 240 GB/s >1 TFlops DP 5120P
  11. 11. Click to edit Master title style 11 Introduction High-level overview of the Intel® Xeon Phi™ platform: Hardware and Software Performance Considerations Intel Xeon Phi Case Studies Intel Xeon Phi Ecosystem Conclusions & References
  12. 12. 12 Based on memory access and flops required • Temporal/spatial locality of data • Bandwidth Requirement 6 GB/s Bandwidth Limited Core Limited Stream-triad BLAS1 & BLAS 2 All Linpack DGEMM Mfg & Scientific Sparse Matrix- Vector Scientific SPECfp2000 All Reservoir Simulation FTDT Oil & Gas Kirchhoff Migration Oil & Gas Fluid Dynamics Ocean Models ScientificFFT Oil & Gas Mil HPC (Y: Math Kernel; B: Applications; W: Segment) Option pricing FSI Molecular Dynamic Scientific Application Characterization RTM Oil & Gas
  13. 13. INTEL CONFIDENTIAL13 75 171 0 50 100 150 200 STREAM Triad (GB/s) 330 802 0 200 400 600 800 1000 SMP Linpack (GF/s) 347 887 0 200 400 600 800 1000 DGEMM (GF/s) 728 1,796 0 500 1000 1500 2000 SGEMM (GF/s) Notes 1. Intel® Xeon® Processor E5-2680 used for all SGEMM Matrix = 12800 x 12800 , DGEMM Matrix 10752 x 10752, SMP Linpack Matrix 26000 x 26000 2. Intel® Xeon Phi™ coprocessor SE10P (ECC on) with “Gold” SW stack SGEMM Matrix = 12800 x 12800, DGEMM Matrix 12800 x 12800, SMP Linpack Matrix 26872 x 28672 3. Average single-node results from measurements across a set of nodes from the TACC+ Stampede* Cluster + Texas Advanced Computing Center (TACC) at the University of Texas at Austin. ++ Measured on the TACC+ Stampede Cluster Coprocessor results: Benchmark run 100% on coprocessor, no help from Intel® Xeon® processor host (aka native) Synthetic Benchmarks Intel® Xeon Phi™ Coprocessor and Intel® MKL UP TO 2.4X UP TO 2.5X UP TO 2.2X UP TO 2.4X Higher is Better • 2S Intel® Xeon® • Intel Xeon Phi ECC ON84% Efficient 83% Efficient 75% Efficient
  14. 14. INTEL CONFIDENTIAL 1.00 3.91 4.63 4.81 0.00 1.00 2.00 3.00 4.00 5.00 6.00 2S Intel® Xeon® Processor SMP Linpack DGEMM SGEMM RelativePerformanceperWatt (Normalizedto1.0Baselineofa2socketIntel® Xeon®processorE5-2670) Performance per Watt Intel® Xeon Phi™ Coprocessor vs. 2S Intel® Xeon® processor (Intel MKL) 14 1 Intel® Xeon Phi™ Coprocessor vs. 2 Socket Intel® Xeon® processor Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: Intel Measured results as of October 26, 2012 Configuration Details: Please reference slide speaker notes. For more information go to http://www.intel.com/performance Notes: 1. 2 X Intel® Xeon® Processor E5-2670 (2.6GHz, 8C, 115W) 2. Intel® Xeon Phi™ coprocessor 5110P (ECC on) with Gold RC SW stack (Coprocessor power only) Higher is Better Coprocessor results: Benchmark run 100% on coprocessor, no help from Intel® Xeon® processor host (aka native) 5110P
  15. 15. Click to edit Master title style 15 Introduction High-level overview of the Intel® Xeon Phi™ platform: Hardware and Software Native, Offload and Variations Intel Xeon Phi Case Studies Intel Xeon Phi Ecosystem Conclusions & References
  16. 16. INTEL CONFIDENTIAL • Click to edit Master text styles ‒ Second level  Third level o Fourth level  Fifth level Click to edit Master title style © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Wide Spectrum of Execution Models General purpose serial and parallel computing Codes with highly- parallel phases Highly-parallel codes Codes with balanced needs Main( ) Foo( ) MPI_*() Foo( ) Main( ) Foo( ) MPI_*() Main() Foo( ) MPI_*() Main( ) Foo( ) MPI_*() Main( ) Foo( ) MPI_*() Multicore Many-core Multicore Centric Many-core Centric (Intel® Xeon® processors) (Intel® Many Integrated Core co-processors) Multi-core-hosted Offload Symmetric Many-core-hosted Range of Models to Meet Application Needs 16
  17. 17. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. The Intel Manycore Platform Software Stack (MPSS) provides Linux on the coprocessor 17 Linux* OS Intel® Xeon Phi™ Coprocessor support libraries, tools, and drivers Linux* OS PCI-E Bus PCI-E Bus Intel® Xeon Phi™ Coprocessor communication and application- launch support Intel® Xeon Phi™ CoprocessorHost Processor System-level code System-level code User-level codeUser-level code
  18. 18. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Runs either as an accelerator for offloaded host computation… 18 Linux* OS Intel® Xeon Phi™ Coprocessor support libraries, tools, and drivers Linux* OS PCI-E Bus PCI-E Bus Intel® Xeon Phi™ Coprocessor communication and application- launch support Intel® Xeon Phi™ CoprocessorHost Processor System-level code System-level code User-level codeUser-level code Offload libraries, user-level driver, user-accessible APIs and libraries User code Host-side offload application User code Offload libraries, user-accessible APIs and libraries Target-side offload applicationAdvantages • More memory available • Better file access • Host better on serial code • Better uses resources
  19. 19. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. …Or runs as a native or MPI* compute node via IP or OFED 19 Linux* OS Intel® Xeon Phi™ Coprocessor support libraries, tools, and drivers Linux* OS PCI-E Bus PCI-E Bus Intel® Xeon Phi™ Coprocessor communication and application- launch support Intel® Xeon Phi™ CoprocessorHost Processor System-level code System-level code User-level codeUser-level code Advantages • Simpler model • No directives • Easier port • Good kernel test ssh or telnet connection to coprocessor IP address Virtual terminal session Use if • Not serial • Modest memory • Complex code Target-side “native” application User code Standard OS libraries plus any 3rd-party or Intel libraries IB fabric
  20. 20. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Flexible: Enables Multiple Programming Models 20 CPU MIC CPU MIC Data MPI Data Network Homogenous network of many-core CPUs CPU MIC CPU MIC Data MPI Data Network Data Data Heterogeneous network of homogeneous CPUs CPU MIC CPU MIC MPI Offload Offload Network Data Data Homogenous network of heterogeneous nodes Coprocessor only Host+Offload Symmetric
  21. 21. Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Click to edit Master text styles • Second level – Third level – Fourth level – Fifth level Click to edit Master title style © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Advisor XE VTune Amplifier XE Inspector XE Trace Analyzer Code Analysis Comprehensive set of SW tools for Xeon and Xeon Phi Programing Intel Cilk Plus Threading Building Blocks OpenMP OpenCL MPI Offload/Native/MYO Programming Models Math Kernel Library Integrated Performance Primitives Intel Compilers Libraries & Compilers 21
  22. 22. Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Click to edit Master title style First Level • Second level – Third level – Fourth level – Fifth level INTEL CONFIDENTIAL 22 • Click to edit Master text styles ‒ Second level  Third level o Fourth level  Fifth level Click to edit Master title style © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Options for Thread Parallelism Intel® Math Kernel Library OpenMP* Intel® Threading Building Blocks Intel® Cilk™ Plus OpenCL* Pthreads* and other threading libraries Programmer control Ease of use / code maintainability Choice of unified programming to target Intel® Xeon® and Intel® Xeon Phi™ Architecture! 22
  23. 23. Click to edit Master title style 23 Introduction High-level overview of the Intel® Xeon Phi™ platform: Hardware and Software Intel Xeon Phi Case Studies Intel Xeon Phi Ecosystem Conclusions & References
  24. 24. INTEL CONFIDENTIAL24 145X FASTER 0.46 SECONDS STEP 1. OPTIMIZE CODE Parallelize and vectorize code and continue to run on multi-core Intel Xeon processors 67.097 SECONDS Current Performance STARTING POINT Unoptimized serial code running on multi-core Intel® Xeon® processors 2.3X FASTER 0.197 SECONDS STEP 2. USE COPROCESSORS Run all or part of the optimized code on Intel® Xeon Phi™ coprocessors The Following Performance Results are Based on Already Optimized Code SOURCE: INTEL MEASURED RESULTS AS OF NOVEMBER, 2012 Example: A Two-Step Process with SAXPY Parallelizing for High Performance 340X FASTER
  25. 25. INTEL CONFIDENTIAL • Application: Hybrid Monte-Carlo program that simulates lattice QCD with dynamical Wilson fermions. It is one of the main production programs of the QCDSF collaboration (DEISA) and beyond used for quark simulation. • Status: Many optimizations already in released version; more optimizations and alternative offload model version in development • Demonstrated Results: - No source code changes - Recompiled, selected run-time parameters to get maximum performance 25 Performance Proof-Point: Government and Academic Research BQCD “The performance improvement for BQCD using the Intel Xeon Phi coprocessor was reached in record time, requiring only recompilation. We are confident that larger speed-ups can be obtained with modest modifications of the code.” Prof. Dr. Tilo Wettig Principal Investigator of the QPACE project BQCD Scalability Gflops/Sec (Higher is Better) 0 50 100 150 200 250 300 1 2 4 8 SOURCE: INTEL MEASURED MARCH’13 • 2S Intel® Xeon® Processor E5-2670 • Intel® Xeon Phi™ coprocessor–native (pre-production HW/SW) • 2S Intel Xeon E5-2670 + Intel® Xeon Phi™ coprocessor–symmetric (pre-production HW/SW)
  26. 26. INTEL CONFIDENTIAL • Application: Seismic imaging technique used to obtain a subsurface depth image from input seismic data • Status: See presentation Rice O&G HPC workshop, http://rice2013.og-hpc.org/technical-program • Execution Model: Fully Hybrid MPI+OpenMP using symmetric mode – Highly scalable on cluster • Code Optimization: – Minimal source code changes for dynamic load balancing Performance Proof-Point: Energy Industry CGG: WAVE EQUATION MIGRATION (WEM) 1 2.57 3.57 6.14 0 1 2 3 4 5 6 7 Speedup (Higher is Better) • 2S Intel® Xeon® processor E5-2670 4 MPI / 4 OMP • Intel® Xeon Phi™ Coprocessor (pre-production HW/SW) 12 MPI / 20 OMP • 2S Intel Xeon processor E5-2670 (4/4) + Intel® Xeon Phi™ coprocessor (12/20) (pre-production HW/SW) • 2S Intel Xeon processor E5-2670 (4/4) + 2x Intel® Xeon Phi™ coprocessor (12/20 + 12/20) (pre-production HW/SW) 26 SOURCE: ARSLAN ET AL., CGG 2013, MARCH’13
  27. 27. INTEL CONFIDENTIAL • Application: Monte Carlo algorithms are used to evaluate complex instruments, portfolios, and investments. Performance depends on raw computational power and the performance of exp2() • Status: Case Study available • Highlights: Dramatic performance scaling for both single-precision and double-precision calculations • Demonstrated Results: - Intel® Xeon Phi™ coprocessor fast exp2() and FMA instructions deliver high performance, high accuracy for single precision computations - Compiler based loop unrolling delivers high performance - Cache blocking further optimizes cache utilization, reduces cache misses, and makes outer loop vectorization possible • Read the Case Study: software.intel.com/en-us/articles/case- study-achieving-high-performance-on-monte-carlo-european-option- on-intel-xeon-phi 27 Performance Proof-Point: Financial Services MONTE CARLO EUROPEAN OPTIONS 1 1 10.36 3.34 0 2 4 6 8 10 12 Single Precision Double Precision Speedup (Higher is Better) • 2S Intel® Xeon® processor E5-2670 • 2S Intel Xeon processor E5-2670 + Intel® Xeon Phi™ Coprocessor (pre-production HW/SW) SOURCE: INTEL MEASURED RESULTS AS OF JULY, 2013
  28. 28. INTEL CONFIDENTIAL • Application: Weather Research and Forecasting (WRF) • Status: WRF V3.5 was released 4/18/13 • Code Optimization: – Approximately two dozen files with less than 2,000 lines of code were modified (out of approximately 700,000 lines of code in about 800 files, all Fortran standard compliant) – Most modifications improved performance for both the host and the co-processors • Performance Measurements: Pre release of WRF 3.5 (V3.5Pre) and NCAR supported CONUS2.5KM benchmark (a high resolution weather forecast) • Acknowledgments: There were many contributors to these results, including the National Renewable Energy Laboratory and The Weather Channel Companies Performance Proof-Point: Government and Academic Research WEATHER RESEARCH AND FORECASTING (WRF) 1 1.4 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Speedup (Higher is Better) • 2S Intel® Xeon® processor E5-2670 with eight-node cluster configuration • 2S Intel® Xeon® processor E5-2670 + Intel® Xeon Phi™ coprocessor (pre-production HW/SW) with eight-node cluster configuration 28 SOURCE: INTEL MEASURED RESULTS AS OF JULY, 2013
  29. 29. INTEL CONFIDENTIAL • Application: Sandia National Laboratories' best approximation to an unstructured implicit finite element or finite volume application in fewer than 8000 lines of code • Status: available at http://software.sandia.gov/trac/mantevo/browser/trunk/packages • Demonstrated Results: - Porting was easy using OpenMP - Substituting an Intel MKL routine for the sparse matrix- vector product accelerated performance and will simplify future optimization - The Intel MPI Library enables rapid performance improvement when adding an Intel® Xeon Phi™ coprocessor • Read the Case Study: 29 Performance Proof-Point: Government and Academic Research SANDIA MANTEVO miniFE 1 2.2 0 0.5 1 1.5 2 2.5 Speedup (Higher is Better) • 2S Intel® Xeon® processor E5-2670 • 2S Intel Xeon processor E5-2670 + Intel® Xeon Phi™ coprocessor (pre-production HW/SW) SOURCE: INTEL MEASURED RESULTS AS OF MARCH, 2012 “The programming models available for the Intel MIC Architecture are open-standard and portable between traditional processors and Intel Xeon Phi coprocessors. This should allow us to leverage code development across multiple platforms.” James A. Ang, Ph.D. Extreme-scale Computing, Sandia National Laboratories software.intel.com/ en-us/articles/running-minife-on-intel-xeon-phi-coprocessors
  30. 30. INTEL CONFIDENTIAL30 DEMONSTRATED PERFORMANCE BENEFITS Intel® Xeon Phi™ Coprocessor UP TO 2.23X Acceleware 8th Order Isotropic Variable Velocity2 Seismic UP TO 2X Sandia National Labs MiniFE1 Finite Element Analysis 30 1. 8 node cluster, each node with 2S Xeon* (comparison is cluster performance with and without 1 Xeon Phi* per node) (Hetero) 2. 2S Xeon* vs. 1 Xeon Phi* (preproduction HW/SW & Application running 100% on coprocessor (unless otherwise noted) 3. 2S Xeon* vs. 2S Xeon* + 2 Xeon Phi* (offload) UP TO 3.54X China Oil & Gas Geoeast Pre-stack Time Migration3 SOURCE: INTEL MEASURED RESULTS AS OF NOVEMBER, 2012
  31. 31. INTEL CONFIDENTIAL31 DEMONSTRATED PERFORMANCE BENEFITS Intel® Xeon Phi™ Coprocessor UP TO 10.75X Monte Carlo SP3 Finance UP TO 2.7X Jefferson Lab Lattice QCD Physics UP TO 7XBlack-Scholes SP3 31 Notes: 1. 2S Xeon* vs. 1 Xeon Phi* (preproduction HW/SW & Application running 100% on coprocessor unless otherwise noted) 2. Intel Measured Oct. 2012 3. Includes additional FLOPS from transcendental function unit SPEED-UP 2.11X Intel Labs Ray Tracing2 Embree Ray Tracing SOURCE: INTEL MEASURED RESULTS AS OF NOVEMBER, 2012
  32. 32. 32 Introduction High-level overview of the Intel® Xeon Phi™ platform: Hardware and Software Intel Xeon Phi Case Studies Intel Xeon Phi Ecosystem Conclusions & References
  33. 33. INTEL CONFIDENTIAL • System: TACC Stampede is a 10 petaflop supercomputer, one of the largest computing systems in the world for open science research. It became operational on January 7, 2013 • Status: In Service • Workloads: Runs hundreds of applications for thousands of users around the world • Performance: – More than 7 petaflops using Intel® Xeon Phi™ coprocessors1 – More than 2 petaflops using the Intel® Xeon® processor E5 family1 • More Information: – SC12 interview: insidehpc.com/2012/12/06/video- intel-xeon-phi-powers-7-tacc-stampede-super/ – TACC HPC systems overview: www.tacc.utexas.edu/resources/hpc Implementation Proof-Point: Government and Academic Research Texas Advanced Computing Center (TACC) 33 1 http://www.tacc.utexas.edu/resources/hpc/stampede
  34. 34. INTEL CONFIDENTIAL System: Located in Southwest China, it contains 16,000 nodes composing the world's largest (public) installation of Intel Ivy Bridge and Xeon Phi’s processors. Each cluster node is formed with • 2 CPUs hex-core Intel® Xeon® Ivy-Bridge @ 2.2GHz • 3 Intel® Xeon Phi™ cards, each with 57 cores @ 1.1GHz Performance: Theoretical peak of 54.9 Pflop/s • 6.8 Pflop/s from 32,000 Xeon Ivy Bridge sockets • 48.1 Pflop/s from 48,000 Xeon Phi cards • for a total of 3,120,000 cores. 30.65 Pflop/s sustained Linpack. More Information: "Visit to the National University for Defense Technology Changsha, China." Jack Dongarra, University of Tennessee, and Oak Ridge National Laboratory. June 2013. www.netlib.org/utk/people/JackDongarra/PAPERS/tianhe-2- dongarra-report.pdf Tianhe-2 System: #1 June 2013 Top500 List 34
  35. 35. INTEL CONFIDENTIALOther brands and names are the property of their respective owners. A Growing Sotware Ecosystem: Developing today on Intel® Xeon Phi™ coprocessors Shown at SC’12, November 2012 35
  36. 36. 36 Introduction High-level overview of the Intel® Xeon Phi™ platform: Hardware and Software Intel Xeon Phi Case Studies Intel Xeon Phi Ecosystem Conclusions & References
  37. 37. Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Click to edit Master text styles • Second level – Third level – Fourth level – Fifth level Click to edit Master title style © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Conclusions Intel® Xeon Phi™ coprocessor advantages: • Comparable performance potential to other accelerators • Faster time to solution due to reduced development effort • Better investment protection with a single code base for processors and coprocessors Flexible and Wide range of programming models: from pure Native to Offloaded – and all variants between All with the familiar Intel development environment 37
  38. 38. Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Click to edit Master text styles • Second level – Third level – Fourth level – Fifth level Click to edit Master title style © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. One Stop Shop for: Tools & Software Downloads Getting Started Development Guides Video Workshops, Tutorials, & Events Code Samples & Case Studies Articles, Forums, & Blogs Associated Product Links http://software.intel.com/mic-developer Intel® Xeon Phi™ Coprocessor Developer Site: http://software.intel.com/mic-developer 38
  39. 39. Obrigado.
  40. 40. Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Click to edit Master text styles • Second level – Third level – Fourth level – Fifth level Click to edit Master title style © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright © , Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Xeon Phi, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Legal Disclaimer & Optimization Notice Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 40
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×