Why we need Exascale and why we won't get there by 2020 ?


Published on

By Horst Simon, deputy director of Lawrence Berkeley National Laboratory
Optical Interconnects Conference, Santa Fe, New Mexico, May 6, 2013

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Why we need Exascale and why we won't get there by 2020 ?

  1. 1. Why we need Exascale and why wewon’t get there by 2020Horst SimonLawrence Berkeley National LaboratoryOptical Interconnects ConferenceSanta Fe, New MexicoMay 6, 2013
  2. 2. 2 2Overview•  Current state of HPC: petaflops firmlyestablished•  Why we won’t get to exaflops (in 2020)•  The case for exascale­ Science (old)­ Science (new)­ Technology and economic competitiveness­ DiscoveryThank you to the DOE Hexlab group, the TOP500 team, John Shalf, KathyYelick and many others.
  3. 3. 3 3Current State of HPC: petaflops firmly established
  4. 4. Top 10 Systems in November 2012# Site Manufacturer Computer Country CoresRmax[Pflops]Power[MW]1 Oak Ridge National Laboratory CrayTitanCray XK7, Opteron 16C 2.2GHz, Gemini,NVIDIA K20xUSA 560,640 17.6 8.212Lawrence Livermore NationalLaboratoryIBMSequoiaBlueGene/Q,Power BQC 16C 1.6GHz, CustomUSA 1,572,864 16.3 7.893RIKEN Advanced Institute forComputational ScienceFujitsuK ComputerSPARC64 VIIIfx 2.0GHz,Tofu InterconnectJapan 795,024 10.5 12.664 Argonne National Laboratory IBMMiraBlueGene/Q,Power BQC 16C 1.6GHz, CustomUSA 786,432 8.16 3.955 Forschungszentrum Juelich (FZJ) IBMJuQUEENBlueGene/Q,Power BQC 16C 1.6GHz, CustomGermany 393,216 4.14 1.976 Leibniz Rechenzentrum IBMSuperMUCiDataPlex DX360M4,Xeon E5 8C 2.7GHz, Infiniband FDRGermany 147,456 2.90 3.527Texas Advanced ComputingCenter/UTDellStampedePowerEdge C8220,Xeon E5 8C 2.7GHz, Intel Xeon PhiUSA 204,900 2.668National SuperComputerCenter in TianjinNUDTTianhe-1ANUDT TH MPP,Xeon 6C, NVidia, FT-1000 8CChina 186,368 2.57 4.049 CINECA IBMFermiBlueGene/Q,Power BQC 16C 1.6GHz, CustomItaly 163,840 1.73 0.8210 IBM IBMDARPA Trial SubsetPower 775,Power7 8C 3.84GHz, CustomUSA 63,360 1.52 3.57
  5. 5. 5 5•  Leading Technology Paths (Swim Lanes)­  Multicore: Maintain complex cores, and replicate (x86, SPARC,Power7) [#3, 6, and 10]­  Manycore/Embedded: Use many simpler, low power cores fromembedded (BlueGene) [ #2, 4, 5 and 9]­  GPU/Accelerator: Use highly specialized processors fromgaming/graphics market space (NVidia Fermi, Cell, Intel Phi(MIC) ), [# 1, 7, and 8]Technology Paths to Exascale
  6. 6. 6 6Events of 2011/2012 and the Big Questionsfor the Next Three Years until 2015Blue Gene is the last of the line.Will there be no more large scale embedded multicore machines?Intel bought Cray’s interconnect technology and WhamCloud.Will Intel develop complete systems?Multicore:Manycore/Embedded:GPU/Accelerator:IBM cancels “Blue Waters” contract.Is this an indication that multicore with complex cores is nearingthe end of the line?✔Prediction: All Top 10 systems in 2015 will be GPU/Accelerator based
  7. 7. 7 7•  Upgrade of Jaguar from Cray XT5 to XK6•  Cray Linux Environment operating system•  Gemini interconnect­ 3-D Torus­ Globally addressable memory­ Advanced synchronization features•  AMD Opteron 6274 processors (Interlagos)•  New accelerated node design usingNVIDIA multi-core accelerators­ 2011: 960 NVIDIA x2090 “Fermi” GPUs­ 2012: 14,592 NVIDIA “Kepler” GPUs•  20+ PFlops peak system performance•  600 TB DDR3 mem. + 88 TB GDDR5 memTitan System at Oak Ridge NationalLaboratory (accelerator)
  8. 8. 8 8•  Fully deployed in June 2012•  16.32475 petaflops Rmax and 20.13266 petaflops Rpeak•  7.89 MW Power consumption•  IBM Blue Gene/Q•  98,304 compute nodes•  1.6 million processor cores•  1.6 PB of memory•  8 or 16 core Power Architecture processorsbuilt on a 45 nm fabrication method•  Lustre Parallel File SystemSequoia at Lawrence Livermore NationalLaboratory (Manycore/embedded)
  9. 9. 9 9•  #3 on the TOP500 list•  Distributed memory architecture•  Manufactured by Fujitsu•  Performance: 10.51 PFLOPS Rmax and 11.2804 PFLOPS Rpeak•  12.65989 MW Power Consumption, 824.6 GFLOPS/Kwatt•  Annual running costs - $10 millionK Computer at RIKEN Advanced Institute forComputational Science (Multicore)•  864 cabinets:− 88,128 8-core SPARC64 VIIIfx processors@ 2.0 GHz− 705,024 cores− 96 computing nodes + 6 I/O nodes percabinet•  Water cooling system minimizesfailure rate and power consumption
  10. 10. 10 10•  Swim Lane choices – currently all three are equallyrepresented•  Risk in Swim Lane switch for large facilities­  Select too soon: Users cannot follow­  Select too late: Fall behind performance curve­  Select incorrectly: Subject users to multiple disruptive technologychangeChoice of Swim Lanes
  11. 11. 11 11Edison is the next production computingsystem dedicated to DOE Office of Science•  First Cray Cascade System–  Developed for the DARPA HPCSProgram–  First Cray system with Intelprocessors–  New Aries interconnect•  Energy efficient design–  Warm water cooling–  Power capping–  ~ 1.1MW per PF withconventional CPUs only5200 compute nodes> 100K processing cores333 Terabytes memory> 2 petaflops peak
  12. 12. 12 12NERSC Roadmap!"#$%&()*+),-)!"#$%&.()*/),-)!"#$%&0()+*1),-)!"#$%&2()+*3),-)!"#$%&4()35),-)!"#$%&3()145),-)!"#$%&+5()+*1)"-)5*5+)5*+)+)+5)+55)+555)+5555)6552)6554)6553)65+5)65++)65+6)65+1)65+/)65+)65+0)65+2)65+4)65+3)6565)656+)6566)6561)!"#$%!"&#()*+*",%
  13. 13. 13 13•  IBM cancels “Blue Waters” contract  Is this anindication that multicore with complex cores is nearingthe end of the line?•  Blue Gene Q is the last of the line.  Will there be nomore large scale embedded multicore machines?•  Intel bought Cray’s interconnect technology andWhamCloud  Will Intel develop complete systems?Events of 2011/2012 and the Big Question forthe Next Three Years until 2016•  Will there be only manycore and accelerator basedsystems among the TOP10 in 2016?
  14. 14. 14 14Performance DevelopmentSource: TOP500 November 20120.1  1  10  100  1000  10000  100000  1000000  10000000  100000000  1E+09  1994  1996  1998  2000  2002  2004  2006  2008  2010  2012  59.7 GFlop/s400 MFlop/s1.17 TFlop/s17.6 PFlop/s76.5 TFlop/s162 PFlop/sSUMN=1N=5001 Gflop/s1 Tflop/s100 Mflop/s100 Gflop/s100 Tflop/s10 Gflop/s10 Tflop/s1 Pflop/s100 Pflop/s10 Pflop/s1 Eflop/s
  15. 15. 15 15•  Pflops computing fully established with more than 20machines•  Three technology “swim lanes” are thriving•  Interest in supercomputing is now worldwide, andgrowing in many new markets•  Exascale projects in many countries and regions•  Rapid growth predicted by IDC for the next three yearsState of Supercomputing in 2013
  16. 16. 16 16Why we won’t get to Exaflops by 2020
  17. 17. 17 17The Power and Clock Inflection Point in 2004!""!"!!"#!!! "$%&"$!"$&"$$!"$$&(!!!(!!&(!"!(!"&(!(!)*++,-./012*/.34!"#$%&"() *"+,)- .%&- !"#$%&"() /0)$" .%&-123* !" 4-&5"!!) 5678+9259:;+ <5*+.=235.*/ >.*3+;/"?@A!""?@A!("?@A!B"?@A!C"D"D$("D"D$E"D"D!!"D"D!C"D"D!"D"D"("D"D"EF5;3GHI:JK6%7- 4&%-##%& .)%8 9/!:; 9!;6%7- 4&%-##%& .)%8 9/!:; 9<;6%7- 4&%-##%& .)%8 9/!:; 9/;=)-&($%& .%&- .)%8 9/!:; 9!;=)-&($%& .%&- .)%8 9/!:; 9<;=)-&($%& .%&- .)%8 9/!:; 9/;Source: Kogge and Shalf, IEEE CISE
  18. 18. 18 18•  Exascale has been discussed in numerous workshops,conferences, planning meetings for about six years•  Exascale projects have been started in the U.S. andmany other countries and regions•  Key challenges to exascale remainTowards Exascale
  19. 19. 19 19The Exascale Challenge• Build a system before 2020 that will be #1 on the TOP500list with an Rmax > 1 exaflops• There is a lot of vagueness in the exascale discussionand what it means to reach exascale. Let me propose aconcrete and measurable goal.• Personal bet (with Thomas Lippert, Jülich, Germany) thatwe will not reach this goal by November 2019 (for $2,000or €2000)• Unfortunately, I think that I will win the bet. But I wouldrather see an exaflops before the end of the decade andloose my bet.
  20. 20. 20 20At Exascale, even LINPACK is a ChallengeHPL results for #1 System on the TOP500 (15 machines in that club)Source: Jack Dongarra, ISC’12
  21. 21. 21 21At Exascale HPL will take 5.8 daysSource: Jack Dongarra, ISC’12
  22. 22. 22 22Projected Performance DevelopmentData from: TOP500 November 20120.1  1  10  100  1000  10000  100000  1000000  10000000  100000000  1E+09  1E+10  1E+11  1996  2002  2008  2014  2020  SUMN=1N=5001 Gflop/s1 Tflop/s100 Mflop/s100 Gflop/s100 Tflop/s10 Gflop/s10 Tflop/s1 Pflop/s100 Pflop/s10 Pflop/s1 Eflop/s
  23. 23. 23 23Total Power Levels (kW) for TOP500 SystemsSource: TOP500 November 20120  1  2  3  4  5  6  7  8  9  10  11  12  13  0   50   100   150   200   250   300   350   400   450   500  Power    [MWa+s]  TOP500  Rank  
  24. 24. 24 24Power Efficiency has gone up significantly in20120200400600800100012002008 2009 2010 2011 2012Linpack/Power[Gflops/kW]TOP10TOP50  TOP500  Data from: TOP500 November 2012
  25. 25. 25 25Most Power Efficient ArchitecturesComputerRmax/PowerAppro GreenBlade, Xeon 8C 2.6GHz, Infiniband FDR, Intel Xeon Phi 2,450Cray XK7, Opteron 16C 2.1GHz, Gemini, NVIDIA Kepler 2,243BlueGene/Q, Power BQC 16C 1.60 GHz, Custom 2,102iDataPlex DX360M4, Xeon 8C 2.6GHz, Infiniband QDR, Intel Xeon Phi 1,935RSC Tornado, Xeon 8C 2.9GHz, Infiniband FDR, Intel Xeon Phi 1,687SGI Rackable, Xeon 8C 2.6GHz, Infiniband FDR, Intel Xeon Phi 1,613Chundoong Cluster, Xeon 8C 2GHz, Infiniband QDR, AMD Radeon HD 1,467Bullx B505, Xeon 6C 2.53GHz, Infiniband QDR, NVIDIA 2090 1,266Intel Cluster, Xeon 8C 2.6GHz, Infiniband FDR, Intel Xeon Phi 1,265Xtreme-X , Xeon 8C 2.6GHz, Infiniband QDR, NVIDIA 2090 1,050[Tflops/MW] = [Mflops/Watt]
  26. 26. 26 26Power Efficiency over Time0  500  1,000  1,500  2,000  2,500  3,000  2008   2009   2010   2011   2012  Linpack/Power      [Gflops/kW]  TOP10  TOP50  TOP500  Accelerator and BGmulticoreData from: TOP500 November 2012
  27. 27. 27 27Power Efficiency over Time0  500  1,000  1,500  2,000  2,500  3,000  2008   2009   2010   2011   2012  Linpack/Power      [Gflops/kW]  TOP10  TOP50  TOP500  0  500  1,000  1,500  2,000  2,500  3,000  2008   2009   2010   2011   2012  Linpack/Power      [Gflops/kW]  TOP10  TOP50  TOP500  One timetechnologyimprovement, nota change in trendrateData from: TOP500 November 2012
  28. 28. 28 28•  Cost to move a bit on copper wire:•  energy = bitrate * Length2 / cross-section area•  Wire data capacity constant as feature size shrinks•  Cost to move bit proportional to distance•  ~1TByte/sec max feasible off-chip BW (10GHz/pin)•  Photonics reduces distance-dependence of bandwidthThe Problem with Wires:Energy to move data proportional to distanceCopper requires to signal amplificationeven for on-chip connectionsPhotonics requires no redriveand passive switch little power
  29. 29. 29 291  10  100  1000  10000  now  SMP  The Cost of Data MovementMPI picojoules/bit
  30. 30. 30 30110100100010000nowSMP  The Cost of Data MovementMPICMP Cost of a FLOPpicojoules/bit
  31. 31. 31 31The Cost of Data Movement in 2018!"!#"!##"!###"!####"!"#$%&"#()*+,(-#.//#01234*5#6//#01234*5#&7234*58!9:#;03<;#*1,(-3011(3,#=-0++#+>+,(/#"*30?0@;(+#10A#BC.D#FLOPs will cost lessthan on-chip datamovement! (NUMA)FLOPsNone of the fastforward projectsaddresses thischallengepicojoules/bit
  32. 32. 32 32It’s the End of the World as We Know It!!"#!!"$#%"!!%"&#%"#!%"$#&"!!&"&#!"#!"#$%!"#!"#!!!"#!"#!&!"#!"#!!"#!"#"(()*(+,-.,,+/012(3456/478.16)*+, -./012#34 516+0 71893:;9 7<=093#39= 298 =189 ->?@4 >9*#7189 -.A4Source: Kogge and Shalf, IEEE CISESummary Trends
  33. 33. 33 331.  HPL at Exaflop/s level will take about a week to run –challenge for center directors and new systems.2.  We realized a one time gain in power efficiency byswitching to accelerator/manycore. This is not asustainable trend in the absence of other newtechnology3.  Data movement will cost more than flops (even onchip).4.  Limited amount of memory, low memory/flop ratios(processing is free)Why I believe that I will win my bet
  34. 34. 34 34The Logical ConclusionIf FLOPS are free, then why do we need an “exaflops”initiative?“Exaflops”“Exascale”“Exa”-anything has become a bad brand•  Associated with buying big machines for the labs•  Associated with “old” HPC•  Sets up the community for “failure,” if “goal” can’tbe met
  35. 35. 35 35It is not just “exaflops” – we are changing thewhole computational modelCurrent programming systems have WRONG optimization targets•  Peak clock frequency as primarylimiter for performance improvement•  Cost: FLOPs are biggest cost forsystem: optimize for compute•  Concurrency: Modest growth ofparallelism by adding nodes•  Memory scaling: maintain byte perflop capacity and bandwidth•  Locality: MPI+X model (uniformcosts within node & between nodes)•  Uniformity: Assume uniformsystem performance•  Reliability: It’s the hardware’sproblemOld Constraints New Constraints!"!#"!##"!###"!####"$%"&(%")*+,-.*/"!00"12345,6"700"12345,6"(8345,69$):;"<14=<",2.*/4122*4.">/1--"-?-.*0"21@"A#!B"!"#$%"&$()*!+,&--."/012&"+3"405/6++(+,)*+0&--."/012&"+!"#%1"&$(7)*+,&--."/012&"+!"#$%$&()*!(+*,-(+./$0*Fundamentally breaks our current programming paradigmand computing ecosystem•  Power is primary design constraintfor future HPC system design•  Cost: Data movement dominates:optimize to minimize data movement•  Concurrency: Exponential growth ofparallelism within chips•  Memory Scaling: Compute growing2x faster than capacity or bandwidth•  Locality: must reason about datalocality and possibly topology•  Heterogeneity: Architectural andperformance non-uniformity increase•  Reliability: Cannot count onhardware protection alone
  36. 36. 36 36The Science Case for Exascale (old)
  37. 37. 37 37Climate change analysisSimulations Extreme data•  Cloud resolution, quantifyinguncertainty, understandingtipping points, etc., will driveclimate to exascale platforms•  New math, models, and systemssupport will be needed•  “Reanalysis” projects need 100× morecomputing to analyze observations•  Machine learning and other analyticsare needed today for petabyte datasets•  Combined simulation/observation willempower policy makers and scientists
  38. 38. 38 38Michael Wehner, Prabhat, Chris Algieri, Fuyu Li, Bill Collins, Lawrence Berkeley National Laboratory; Kevin Reed, University ofMichigan; Andrew Gettelman, Julio Bacmeister, Richard Neale, National Center for Atmospheric ResearchQualitative Improvement of Simulation withHigher Resolution (2011)
  39. 39. 39 39Exascale combustion simulations•  Goal: 50% improvement in engine efficiency•  Center for Exascale Simulation of Combustionin Turbulence (ExaCT)•  Combines M&S and experimentation•  Uses new algorithms, programmingmodels, and computer science
  40. 40. 40 40Warhead assessment and certification of a smallernuclear stockpileAssessment process: All aspectswill require exascale computingto support commitment never toreturn to underground testingModeling moltentantalum: 10Matoms required;modeling moltenplutonium: 1000×more computingpowerSelf-steering(a data analysischallenge):Essentialto minimizenumberof simulationsPredicting with confidence requiresprecise understanding of agingand individual life-extended weapons•  High accuracy in individual weaponscalculations:•  Advanced physical models•  3-D to resolve features and phenomena•  Extreme resolution•  Uncertainty quantification:•  Massive numbers of high-resolution 3-Dsimulations to explore impacts of smallvariations in individual devices
  41. 41. 41 41The Science Case for Exascale (new)
  42. 42. 42 42Materials genome!Today’s batteriesComputing 1000× today Data services for industry and science•  Key to DOE’s Energy Storage Hub•  Tens of thousands of simulationsused to screen potential materials•  Need more simulations and fidelityfor new classes of materials, studiesin extreme environments, etc.•  Results from tens of thousandsof simulations web-searchable•  Materials Project launched in October2012, now has >3,000 registered users•  Increase U.S. competitiveness; cut inhalf 18 year time from discovery tomarketVoltage limitInterestingmaterialsBy 2018:• Increase energy density (70 miles → 350 miles)• Reduce battery cost per mile ($150 → $30)
  43. 43. 43 43DOE Systems Biology Knowledgebase: KBaseMicrobes Communities Plants•  Integration and modeling for predictive biology•  Knowledgebase enabling predictive systems biology­  Powerful modeling framework­  Community-driven, extensible and scalable open-source software andapplication system­  Infrastructure for integration and reconciliation of algorithms and data sources­  Framework for standardization, search, and association of data­  Resource to enable experimental design and interpretation of results
  44. 44. 44 44•  Brain Research through AdvancingInnovative Neurotechnologies•  Create real-time traffic maps toprovide new insights into braindisorders•  “There is this enormous mysterywaiting to be unlocked, and theBRAIN Initiative will change that bygiving scientists the tools they needto get a dynamic picture of thebrain in action and betterunderstand how we think and howwe learn and how we remember,”Barack ObamaObama Announces BRAIN Initiative –Proposes $100 million in his FY2014 budgetSee Alivisatos et al., Neuron 74, June 2012, pg 970
  45. 45. The Cat is out of the Bag:Cortical Simulations with 109 neurons, 1013 synapses
  46. 46. From Brain to ModelPyramidalFast spiking
  47. 47. LLNL DawnBG/PMay, 2009Human22 x 109220 x 1012Rat56 x 106448 x 109Mouse16 x 106128 x 109N:S:Monkey2 x 10920 x 1012Cat763 x 1066.1 x 1012AlmadenBG/LDecember, 2006WatsonBG/LApril, 2007WatsonShaheenBG/PMarch, 2009Modha Group at IBM AlmadenLatest simulations in 2012 achieve unprecedented scale of65*109 neurons and 16*1012 synapsesNew results for SC12LLNL SequoiaBG/QJune, 2012
  48. 48. 48 480.1  1  10  100  1000  10000  100000  1000000  10000000  100000000  1E+09  1E+10  1E+11  1996  2002  2008  2014  2020  SUM  N=1  N=500  1 Gflop/s1 Tflop/s100 Mflop/s100 Gflop/s100 Tflop/s10 Gflop/s10 Tflop/s1 Pflop/s100 Pflop/s10 Pflop/s1 Eflop/sTowards ExascaleData: TOP500 November 2012
  49. 49. 49 49•  Straight forward extrapolation results in a real timehuman brain scale simulation at about 1 - 10 Exaflop/swith 4 PB of memory•  Current predictions envision Exascale computers in2020 with a power consumption of at best 20 - 30 MW•  The human brain takes 20W•  Even under best assumptions in 2020 our brain will stillbe a million times more power efficientTowards Exascale – the Power Conundrum
  50. 50. 50 50Exascale is Critical to the Nation to Develop FutureTechnologies and Maintain Economic Competitiveness
  51. 51. 51 51•  Digital design and prototyping at exascale enable rapiddelivery of new products to market by minimizing the need forexpensive, dangerous, and/or inaccessible testing•  Potential key differentiator for American competitiveness•  Strategic partnerships between DOE labs and industrialpartners to develop and scale applications to exascale levelsU.S. competitive advantage demands exascaleresources
  52. 52. 52 52“I think that what we’ve seen is that they may be somewhatfurther ahead in the development of that aircraft than ourintelligence had earlier predicted.”—Defense Secretary Robert M. Gates,(The New York Times, Jan. 9, 2011)Exascale computing is key for national securityPotential adversaries are not unaware of this
  53. 53. 53 53The U.S. is not the only country capable of achieving exascale computing.The country that is first to exascale will have significant competitiveintellectual, technological, and economic advantages.Achievement of the power efficiency and reliability goals needed forexascale will have enormous positive impacts on consumer electronicsand business information technologies and facilities.Exascale technologies are the foundation forfuture leadership in computingThe gap in available supercomputing capacity between the United States and the rest ofthe world has narrowed, with China gaining the most ground.Data from TOP500.org; Figure from Science; Vol 335; 27 January 2012
  54. 54. 54 54Exascale is Discovery
  55. 55. 55 55Admiral Zheng He’s Voyages in 1416http://en.wikipedia.org/wiki/Zheng_He
  56. 56. 56 56The Rise of the West“Like the Apollo moon missions,Zheng He’s voyages had been aformidable demonstration of wealthand technological sophistication.Landing … on the East African coastin 1416 was in many ways …comparable to landing an Americanastronaut on the moon in 1969. Byabruptly canceling oceanicexploration, Yongle’s successorsensured that the economic benefitsremained neglible.”
  57. 57. 57 57Exascale is the next step in a long voyage ofdiscovery, we should not give up now and retreat
  58. 58. 58 58Discussion
  59. 59. 59 59Exascale rationale•  Exascale computing means performing computer simulations on a machine capable of 10 to the 18thpower floating point operations per second – ~1,000× the current state-of-the-art.•  The U.S. is capable of reaching exascale simulation capability within ten years with cost-effective,efficient, computers. An accelerated program to achieve exascale is required to meet this goal, and doingso will enable the U.S. to lead in key areas of science and engineering.•  The U.S. is not the only country capable of achieving exascale computing. The country that is first toexascale will have significant competitive intellectual, technological, and economic advantages.•  Exascale systems and applications will improve predictive understanding in complex systems (e.g.climate, energy, materials, nuclear stockpile, emerging national security threats, biology, …)•  Tightly coupled exascale systems make possible fundamentally new approaches to uncertaintyquantification and data analytics – uncertainty quantification is the essence of predictive simulation.•  American industry is increasingly dependent on HPC for design and testing in a race to get products tomarket. Exascale computing and partnerships with DOE laboratories could be THE key differentiator, awinning US strategy for advanced manufacturing and job creation.•  The scale of computing developed for modeling and simulation will have a revolutionary impact on dataanalysis and data modeling . Much of the R&D required is common to big data and exascale computing.•  Exascale provides a ten year goal to drive modeling and simulation developments forward and to attractyoung scientific talent to DOE in a variety of scientific and technology fields of critical importance to thefuture of the U.S.•  Achievement of the power efficiency and reliability goals needed for exascale will have enormous positiveimpacts on consumer electronics and business information technologies and facilities.
  60. 60. 60 60•  There is progress in Exascale with many projects nowfocused and on their way, e.g. FastForward, Xstack, and Co-Design Centers in the U.S.•  HPC has moved to low power processing, and the processorgrowth curves in energy-efficiency could get us in the rangeof exascale feasibility•  Memory and data movement are still more open challenges•  Programming model needs to address heterogeneous,massive parallel environment, as well as data locality•  Exascale applications will be challenge just because theirsheer size and the memory limitationsSummary
  61. 61. 61 61Sustained Petaflops Performance a reality –congratulations to NCSA!