The Path to Exascale Computing –
Challenges and Opportunities
HPC Meet-up
21st May
Gaurav Kaul
Solutions Architect
Intel
2
Outline
Why Exascale?
Existing Trends – The End of Moore’s Law?
Major Technology Challenges (aka “Walls”)
Technologies On the Horizon
Scaling Applications for Peta/Exa-Scale Era
Summary
3
Performance Roadmap
1.E-04
1.E-02
1.E+00
1.E+02
1.E+04
1.E+06
1.E+08
1960 1970 1980 1990 2000 2010 2020
GFLOP
MFLOP
GFLOP
TFLOP
PFLOP
EFLOP
12 Years 11 Years 10 Years
Client
Hand-held
A bit of History …
4
The Top 500 Waterfall
5
50 years of Moore’s Law
6
Moore and Dennard Scaling
7
8
Current Processor Performance Trends
Technology Scaling Outlook
9
10
The Power & Energy Challenge
200W
150W
100W
100W
4550W
5KW
Compute
Memory
Com
Disk
TFLOP Machine today
5W
2W
~5W
~3W
5W
TFLOP Machine then
With Exa Technology
~20W
Promising Technologies
11
Rethink System Level Architecture
12
DRAM Scaling Using 3D Memory
13
Innovative Packaging and I/O
14
15
Needs a Paradigm Shift
Evaluate each (old) architecture feature with
new priorities
Single thread performance Frequency
Programming productivity Legacy, compatibility
Architecture features for productivity
Constraints (1) Cost
(2) Reasonable Power/Energy
Throughput performance Parallelism
Power/Energy Architecture features for energy
Simplicity
Constraints (1) Programming productivity
(2) Cost
Past and present priorities—
Future priorities—
Intel: Investing to Remove 6 Bottlenecks
Interconnect
Memory
&
Storage
Processor
Performance
Reliability
and
Resiliency
Standard
Programming
Model for Parallelism
Power
Efficiency
Impact on Applications
17
The Many Ways to Parallelism
18
And New Workloads will
Emerge
19
Code Modernization – The 4D Approach
20
New for Knights Landing(Next Generation Intel® Xeon Phi™ Products)
2nd half ’15
1st commercial systems
3+ TFLOPS1
In One Package
Parallel Performance & Density
On-Package Memory: High Performance
 up to 16GB at launch
 5X Bandwidth vs DDR47
Compute: Intel® Silvermont Arch. (Intel® Atom™)2
 Low-Power Cores with HPC Enhancements3
 3X Single Thread Performance4 vs Prior Gen.
 Intel Xeon Processor Binary Compatible5
 1/3X the Space6
 5X Power Efficiency6
.
.
.
.
.
.
Integrated Fabric
Intel® Silvermont Arch.
Enhanced for HPC6
Processor Package
Conceptual—Not Actual Package Layout
…
Platform Memory: DDR4 Bandwidth and
Capacity Comparable to Intel® Xeon® Processors
LEARN MORE: Knights Landing Webcast (Tuesday June 24th):
https://www.brighttalk.com/webcast/10773/116329
Jointly Developed with Micron Technology
22
What is an FPGA?
FPGAs (Field Programmable Gate Arrays) are
semiconductor devices that can be programmed
- Desired functionality of the FPGA can be (re-)programmed by
downloading a configuration into the device
FPGAs offer several advantages over potential
alternatives:
- Lower one-time development cost, and faster time to market
compared to custom designed chips (ASICs)
- Ability to implement customer-specific functionality beyond
what is available from standard products (ASSPs)
- Customizable and reprogrammable after the device has been
deployed to the field compared to both ASIC and ASSP
0.01
0.1
1
10
100
1000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Acceleration Architectural
Landscape
Source: ISSCC Proceedings
Energyefficiency(MOPS/mW)
Processor Number (sorted by efficiency)
Microprocessors
Reconfigurable
Dedicated HWMore programmable…
More efficient…
10X
100X
Potential for 10-100X higher performance/watt vs. general purpose cores
23
24
FPGAs as Reconfigurable
Accelerators
Intel Confidential — Do Not Forward
25
Example Use Case – HFT
What will matter in 10 years
26
Intel Confidential — Do Not Forward
27
What Next?
Intel Confidential — Do Not Forward
28
Summary

Gaurav slides

  • 1.
    The Path toExascale Computing – Challenges and Opportunities HPC Meet-up 21st May Gaurav Kaul Solutions Architect Intel
  • 2.
    2 Outline Why Exascale? Existing Trends– The End of Moore’s Law? Major Technology Challenges (aka “Walls”) Technologies On the Horizon Scaling Applications for Peta/Exa-Scale Era Summary
  • 3.
    3 Performance Roadmap 1.E-04 1.E-02 1.E+00 1.E+02 1.E+04 1.E+06 1.E+08 1960 19701980 1990 2000 2010 2020 GFLOP MFLOP GFLOP TFLOP PFLOP EFLOP 12 Years 11 Years 10 Years Client Hand-held
  • 4.
    A bit ofHistory … 4
  • 5.
    The Top 500Waterfall 5
  • 6.
    50 years ofMoore’s Law 6
  • 7.
  • 8.
  • 9.
  • 10.
    10 The Power &Energy Challenge 200W 150W 100W 100W 4550W 5KW Compute Memory Com Disk TFLOP Machine today 5W 2W ~5W ~3W 5W TFLOP Machine then With Exa Technology ~20W
  • 11.
  • 12.
    Rethink System LevelArchitecture 12
  • 13.
    DRAM Scaling Using3D Memory 13
  • 14.
  • 15.
    15 Needs a ParadigmShift Evaluate each (old) architecture feature with new priorities Single thread performance Frequency Programming productivity Legacy, compatibility Architecture features for productivity Constraints (1) Cost (2) Reasonable Power/Energy Throughput performance Parallelism Power/Energy Architecture features for energy Simplicity Constraints (1) Programming productivity (2) Cost Past and present priorities— Future priorities—
  • 16.
    Intel: Investing toRemove 6 Bottlenecks Interconnect Memory & Storage Processor Performance Reliability and Resiliency Standard Programming Model for Parallelism Power Efficiency
  • 17.
  • 18.
    The Many Waysto Parallelism 18
  • 19.
    And New Workloadswill Emerge 19
  • 20.
    Code Modernization –The 4D Approach 20
  • 21.
    New for KnightsLanding(Next Generation Intel® Xeon Phi™ Products) 2nd half ’15 1st commercial systems 3+ TFLOPS1 In One Package Parallel Performance & Density On-Package Memory: High Performance  up to 16GB at launch  5X Bandwidth vs DDR47 Compute: Intel® Silvermont Arch. (Intel® Atom™)2  Low-Power Cores with HPC Enhancements3  3X Single Thread Performance4 vs Prior Gen.  Intel Xeon Processor Binary Compatible5  1/3X the Space6  5X Power Efficiency6 . . . . . . Integrated Fabric Intel® Silvermont Arch. Enhanced for HPC6 Processor Package Conceptual—Not Actual Package Layout … Platform Memory: DDR4 Bandwidth and Capacity Comparable to Intel® Xeon® Processors LEARN MORE: Knights Landing Webcast (Tuesday June 24th): https://www.brighttalk.com/webcast/10773/116329 Jointly Developed with Micron Technology
  • 22.
    22 What is anFPGA? FPGAs (Field Programmable Gate Arrays) are semiconductor devices that can be programmed - Desired functionality of the FPGA can be (re-)programmed by downloading a configuration into the device FPGAs offer several advantages over potential alternatives: - Lower one-time development cost, and faster time to market compared to custom designed chips (ASICs) - Ability to implement customer-specific functionality beyond what is available from standard products (ASSPs) - Customizable and reprogrammable after the device has been deployed to the field compared to both ASIC and ASSP
  • 23.
    0.01 0.1 1 10 100 1000 1 2 34 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Acceleration Architectural Landscape Source: ISSCC Proceedings Energyefficiency(MOPS/mW) Processor Number (sorted by efficiency) Microprocessors Reconfigurable Dedicated HWMore programmable… More efficient… 10X 100X Potential for 10-100X higher performance/watt vs. general purpose cores 23
  • 24.
  • 25.
    Intel Confidential —Do Not Forward 25 Example Use Case – HFT
  • 26.
    What will matterin 10 years 26
  • 27.
    Intel Confidential —Do Not Forward 27 What Next?
  • 28.
    Intel Confidential —Do Not Forward 28 Summary