Top500 Slides for June 2014

  • 7,760 views
Uploaded on

The first version of what became today’s TOP500 list started as an exercise for a small conference in Germany in June 1993. Out of curiosity, the authors decided to revisit the list in November 1993 …

The first version of what became today’s TOP500 list started as an exercise for a small conference in Germany in June 1993. Out of curiosity, the authors decided to revisit the list in November 1993 to see how things had changed. About that time they realized they might be on to something and decided to continue compiling the list, which is now a much-anticipated, much-watched and much-debated twice-yearly event.

The TOP500 list is compiled by Erich Strohmaier and Horst Simon of Lawrence Berkeley National Laboratory; Jack Dongarra of the University of Tennessee, Knoxville; and Martin Meuer of Prometeus, Germany.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
7,760
On Slideshare
0
From Embeds
0
Number of Embeds
13

Actions

Shares
Downloads
0
Comments
0
Likes
5

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Tianhee-1A #1 Nov 10
  • November 2013 TOP10
  • Record low turnover
  • Annual versus Moore’s Law
    1.87 TOP500 versus 1.59 Moore’s Law
  • Nov09 N500 growth starts lagging – dropped from 200%to 150% per year
  • Statistical significant inflection point in June 2008 for end of the list
  • No500 is lagging 10x by end of decade if this continues
  • Exceptional situation since 2011/12: Largest group of large systems on top not just No1
  • Gini: 0 = perfect equality (All system are the same size); 1 = total inequality (one system has all Rmax)

  • Gini: 0 = perfect equality (All system are the same size); 1 = total inequality (one system has all Rmax)

  • Third slice (dark) is Xeon Phi
  • How many lists has a system been in the TOP500
    “Stimulus in 2007/2008” but
    lack of replacement spending in government and industry alike since 2010
  • Average age till 2011 was 1.27 years
  • Interesting is the average age of systems:
    US grew from 1.25 to 2.25
    Europe to 2.75
    Japan always had 2-3 years old systems, no close to 3 years
    China is at 1.1 years and has youngest population by far – China kept spending
  • Fujitsu, Dell, and Dawning have 2 each
  • 38% and 26% per year

Transcript

  • 1. Highlights of the 43rd TOP500 List ISC’14, Leipzig, Germany
  • 2. 43rd List: The TOP10 # Site Manufacturer Computer Country Cores Rmax [Pflops] Power [MW] 1 National University of Defense Technology NUDT Tianhe-2 NUDT TH-IVB-FEP, Xeon 12C 2.2GHz, IntelXeon Phi China 3,120,000 33.9 17.8 2 Oak Ridge National Laboratory Cray Titan Cray XK7, Opteron 16C 2.2GHz, Gemini, NVIDIA K20x USA 560,640 17.6 8.21 3 Lawrence Livermore National Laboratory IBM Sequoia BlueGene/Q, Power BQC 16C 1.6GHz, Custom USA 1,572,864 17.2 7.89 4 RIKEN Advanced Institute for Computational Science Fujitsu K Computer SPARC64 VIIIfx 2.0GHz, Tofu Interconnect Japan 795,024 10.5 12.7 5 Argonne National Laboratory IBM Mira BlueGene/Q, Power BQC 16C 1.6GHz, Custom USA 786,432 8.59 3.95 6 Swiss National Supercomputing Centre (CSCS) Cray Piz Daint Cray XC30, Xeon E5 8C 2.6GHz, Aries, NVIDIA K20x Switzer-land 115,984 6.27 2.33 7 Texas Advanced Computing Center/UT Dell Stampede PowerEdge C8220, Xeon E5 8C 2.7GHz, Intel Xeon Phi USA 462,462 5.17 4.51 8 Forschungszentrum Juelich (FZJ) IBM JuQUEEN BlueGene/Q, Power BQC 16C 1.6GHz, Custom Germany 458,752 5.01 2.30 9 Lawrence Livermore National Laboratory IBM Vulcan BlueGene/Q, Power BQC 16C 1.6GHz, Custom USA 393,216 4.29 1.97 10 Government Cray Cray XC30, Xeon E5 12C 2.7GHz, Aries USA 225,984 3.14
  • 3. • Titan, a Cray XK7 system installed at the Department of Energy’s (DOE) Oak Ridge National Laboratory remains the No. 2 system. It achieved 17.59 Pflop/s on the Linpack benchmark using 261,632 of its NVIDIA K20x accelerator cores. Titan is one of the most energy efficient systems on the list consuming a total of 8.21 MW and delivering 2.143 Gflops/W. • Sequoia, an IBM BlueGene/Q system installed at DOE’s Lawrence Livermore National Laboratory, is again the No. 3 system. It was first delivered in 2011 and has achieved 17.17 Pflop/s on the Linpack benchmark using 1,572,864 cores. • Fujitsu’s K computer installed at the RIKEN Advanced Institute for Computational Science (AICS) in Kobe, Japan, is the No. 4 system with 10.51 Pflop/s on the Linpack benchmark using 705,024 SPARC64 processing cores. • Mira, a BlueGene/Q system installed at DOE’s Argonne National Laboratory, is No. 5 with 8.59 Pflop/s on the Linpack benchmark using 786,432 cores. Highlights: TOP10
  • 4. • At No. 6 is Piz Daint, a Cray XC30 system installed at the Swiss National Supercomputing Centre (CSCS) in Lugano, Switzerland and the most powerful system in Europe. Piz Daint achieved 6.27 Pflop/s on the Linpack benchmark using 73,808 NVIDIA K20x accelerator cores. Piz Daint is also the most energy efficient systems in the TOP10 consuming a total of 2.33 MW and delivering 2.7 Gflops/W. • Stampede, a Dell PowerEdge C8220 system installed at the Texas Advanced Computing Center of the University of Texas, Austin, is at No. 7. It also uses Intel Xeon Phi processors (previously known as MIC) to achieve its 5.17 Pflop/s. • The second system in Europe is at No. 8. It is also a BlueGene/Q system called JUQEEN installed at the Forschungszentrum Juelich in Germany and is listed with 5.01 Pflop/s. • No. 9 is taken by Vulcan, another IBM BlueGene/Q system at Lawrence Livermore National Laboratory. It was temporarily combined with the No. 3 system but is now operated independently. It achieved 4.29 Pflop/s. • At No. 10 is the only new system in the Top10, a Cray XC30 installed at a Government location in the USA with 3.14 Pflop/s. Highlights: TOP10
  • 5. 42nd List: The TOP10 # Site Manufacturer Computer Country Cores Rmax [Pflops ] Power [MW] 1 National University of Defense Technology NUDT Tianhe-2 NUDT TH-IVB-FEP, Xeon 12C 2.2GHz, IntelXeon Phi China 3,120,000 33.9 17.8 2 Oak Ridge National Laboratory Cray Titan Cray XK7, Opteron 16C 2.2GHz, Gemini, NVIDIA K20x USA 560,640 17.6 8.21 3 Lawrence Livermore National Laboratory IBM Sequoia BlueGene/Q, Power BQC 16C 1.6GHz, Custom USA 1,572,864 17.2 7.89 4 RIKEN Advanced Institute for Computational Science Fujitsu K Computer SPARC64 VIIIfx 2.0GHz, Tofu Interconnect Japan 795,024 10.5 12.7 5 Argonne National Laboratory IBM Mira BlueGene/Q, Power BQC 16C 1.6GHz, Custom USA 786,432 8.59 3.95 6 Swiss National Supercomputing Centre (CSCS) Cray Piz Daint Cray XC30, Xeon E5 8C 2.6GHz, Aries, NVIDIA K20x Switzer-land 115,984 6.27 2.33 7 Texas Advanced Computing Center/UT Dell Stampede PowerEdge C8220, Xeon E5 8C 2.7GHz, Intel Xeon Phi USA 462,462 5.17 4.51 8 Forschungszentrum Juelich (FZJ) IBM JuQUEEN BlueGene/Q, Power BQC 16C 1.6GHz, Custom Germany 458,752 5.01 2.30 9 Lawrence Livermore National Laboratory IBM Vulcan BlueGene/Q, Power BQC 16C 1.6GHz, Custom USA 393,216 4.29 1.97 10 Leibniz Rechenzentrum IBM SuperMUC iDataPlex DX360M4, Xeon E5 8C 2.7GHz, Infiniband FDR Germany 147,456 2.90 3.52
  • 6. 116 0 50 100 150 200 250 300 350 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Replacement Rate
  • 7. • The overall list-by-list growth rates of performance are for the second time in a row at historical low values. • The performance of the last system on the list (#500) has systematically lagged behind historical trends for the last 5 years and now appears to be on a different growth trajectory then before. From 1994 to 2008 it grew by 90% per year. Since 2008 it only grows by 55% per year • The growth of the average performance of all systems in the list lagged only for the last two lists behind historical averages. This average is noticeably influenced by the very large systems on the top of the list. Recent installations of very large systems until June 2013 have counteracted the reduced growth rate at the bottom of the list. This offers an indication that the market for the very largest systems might currently behave differently from the market of mid- sized and smaller supercomputers. Highlights from the Overall List
  • 8. Annual Performance Increase of the TOP500 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 − Moore’s Law − TOP500 Trend
  • 9. Performance Development 0.1 1 10 100 1000 10000 100000 1000000 10000000 100000000 1E+09 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 59.7 GFlop/s 400 MFlop/s 1.17 TFlop/s 33.9 PFlop/s 134 TFlop/s 274 PFlop/s SUM N=1 N=500 1 Gflop/s 1 Tflop/s 100 Mflop/s 100 Gflop/s 100 Tflop/s 10 Gflop/s 10 Tflop/s 1 Pflop/s 100 Pflop/s 10 Pflop/s 1 Eflop/s
  • 10. Performance Development 0.1 1 10 100 1000 10000 100000 1000000 10000000 100000000 1E+09 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 59.7 GFlop/s 400 MFlop/s 1.17 TFlop/s 33.9 PFlop/s 134 TFlop/s 274 PFlop/s SUM N=1 N=500 1 Gflop/s 1 Tflop/s 100 Mflop/s 100 Gflop/s 100 Tflop/s 10 Gflop/s 10 Tflop/s 1 Pflop/s 100 Pflop/s 10 Pflop/s 1 Eflop/s June 2008
  • 11. Projected Performance Development 0.1 1 10 100 1000 10000 100000 1000000 10000000 00000000 1E+09 1E+10 1E+11 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 SUM N=1 N=500 1 Gflop/s 1 Tflop/s 100 Mflop/s 100 Gflop/s 100 Tflop/s 10 Gflop/s 10 Tflop/s 1 Pflop/s 100 Pflop/s 10 Pflop/s 1 Eflop/s
  • 12. Performance Fraction of TOP5 Systems 0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 1 2 3 4 5
  • 13. 0 10 20 30 40 50 60 70 80 90 100 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 Rank at which Half of total Performance is accumulated
  • 14. • A measure of statistical dispersion intended to represent inequality – Area A above the Lorenz curve (cummulative distribution) – Gini = A/(A+B) – 0: All members have the same – 1: One member has everything Gini Coefficient
  • 15. Gini Coefficient of the TOP500 30 35 40 45 50 55 60 65 70 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014
  • 16. Accelerators 0 10 20 30 40 50 60 70 2006 2007 2008 2009 2010 2011 2012 2013 2014 Systems Intel Xeon Phi Clearspeed IBM Cell ATI Radeon Nvidia Kepler Nvidia Fermi
  • 17. Performance Share of Accelerators 0% 5% 10% 15% 20% 25% 30% 35% 40% 2006 2007 2008 2009 2010 2011 2012 2013 2014 FractionofTotalTOP500 Performance
  • 18. System Age (in List Count) 0 50 100 150 200 250 300 350 400 450 500 4 3 2 1 0
  • 19. Average System Age 0 0.5 1 1.5 2 2.5 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 Age[Years] 1.27 years
  • 20. United States 47% China 15%Japan 6% United Kingdom 6% France 5% Germany 4% India 2% Canada 2% Others 13% United States China Japan United Kingdom France Germany India Canada Others Countries / System Share
  • 21. Performance of Countries 0 1 10 100 1,000 10,000 100,000 2000 2002 2004 2006 2008 2010 2012 2014 TotalPerformance[Tflop/s] US EU Japan China
  • 22. HP, 182, 36% IBM, 176, 35% Cray Inc., 51, 10% SGI, 19, 4% Bull, 16, 3% Dell, 8, 2% Fujitsu, 8, 2% NUDT, 4, 1% MEGWARE, 4, 1% Others, 32, 6% HP IBM Cray Inc. SGI Bull Dell Fujitsu NUDT MEGWARE Others Vendors / System Share
  • 23. IBM, 14, 28% Cray Inc, 12, 24% SGI, 5, 10% Fujitsu, 3, 6% NUDT, 3, 6% Bull, 3, 6% Others, 10, 20% IBM Cray Inc SGI Fujitsu NUDT Bull Others Vendors (TOP50) / System Share
  • 24. Cores per Socket 0 50 100 150 200 250 300 350 400 450 500 2002 2004 2006 2008 2010 2012 2014 16 12 10 9 8 6 4 2 1
  • 25. Power Consumption 0 1 2 3 4 5 6 7 8 2008 2009 2010 2011 2012 2013 2014 Power[MW] TOP10 TOP50 TOP500 2.7 x in 5 y 2.6 x in 5 y 3.0 x in 5 y
  • 26. Power Efficiency 0 500 1,000 1,500 2,000 2,500 2008 2009 2010 2011 2012 2013 2014 Linpack/Power[Gflops/kW] TOP10 TOP50 TOP500
  • 27. Power Efficiency 0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 2008 2009 2010 2011 2012 2013 2014 Linpack/Power[Gflops/kW] TOP10 TOP50 TOP500 Max-Efficiency BlueGene/Q Cell Mic AMD FirePro Tsubame KFC NVIDIA K20x
  • 28. Most Power Efficient Architectures Computer Rmax/ Power Tsubame KFC, NEC, Xeon 6C 2.1GHz, Infiniband FDR, NVIDIA K20x 3,418 Romeo, Bull Cluster, Xeon 8C 2.6GHz, Infiniband FDR, NVIDIA K20x 3,131 HA-PACS TCA, Cray Cluster, Xeon 10C 2.8GHz, QDX, NVIDIA K20x 2,980 SANAM, Adtech, ASUS, Xeon 8C 2.0GHz, Infiniband FDR, AMD FirePro 2,973 HPC2, iDataPlex DX360, Xeon 10C 2.8GHz, Infiniband FDR, NVIDIA K20x 2,702 Piz Daint, Cray XC30, Xeon 8C 2.6GHz, Aries, NVIDIA K20x 2,697 Shadow, Cray CS300-LC, Xeon E5-2680v2 10C 2.8GHz, Infiniband FDR, Xeon Phi 5110P 2,495 BlueGene/Q, Power BQC 16C 1.60 GHz, Custom 2,300 HPCC, Cluster Platform SL250s, Xeon 8C 2.4GHz, FDR, NVIDIA K20m 2,243 [Mflops/Watt]