Sc10 slide share


Published on

SC10 Diary

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • bal
  • Add – contour -- airV
  • Jack Dongarra talked about the Top10 list
  • China’s #1 worlds leader, Director of the project.
  • France - #1 in Europe, Bull, Terra 100 with Mellanox Technology and Voltaire’s swithes
  • Students competition – a super computer for 26 watts – 3 coffee makers – 1+ TFlops
  • Differences between Ethernet and IB

  • Tsubame 2.0 – Windows, Voltaire

    TOP500 Highlights - November 2010
    TOP500 Highlights
    ·      The Chinese Tianhe-1A system is the new No. 1 on the TOP500 and clearly in the lead with 2.57 petaflop/s performance.
    ·      No. 3 is also a Chinese system called Nebulae, built from a Dawning TC3600 Blade system with Intel X5650 processors and NVIDIA Tesla C2050 GPUs
    ·      There are seven petaflop/s systems in the TOP10
    ·      The U.S. is tops in petaflop/s with three systems performing at the petaflop/s level
    ·      The two Chinese systems and the new Japanese Tsubame 2.0 system at No. 4 are all using NVIDIA GPUs to accelerate computation and a total of 28 systems on the list are using GPU technology.
    ·      China keeps increasing its number of systems to 41 and is now clearly the No. 2 country, as a user of HPC, ahead of Japan, France, Germany, and UK.
    ·      The Jaguar system at Oak Ridge National Laboratory slipped to the No. 2 spot with 1.75 Pflop/s Linpack performance.
    ·      The most powerful system in Europe is a Bull system at the French CEA at No. 6.
    ·      Intel dominates the high-end processor market, with 79.6 percent of all systems and over 90 percent of quad-core based systems.
    ·      Intel’s Westmere processors increased their presence in the list with 56 systems, compared with seven in the last list.
    ·      Quad-core processors are used in 73 percent of the systems, while 19 percent of the systems use processors with six or more cores.
    ·      Other notable systems are:
    -     The Grape custom accelerator-based systems in Japan at No. 280 and No. 384
    -     The #4 system Tsubame 2.0 that can run a Windows OS and achieve almost identical performance doing so.
    ·      Cray regained the No. 2 spot in market share by total performance from Hewlett-Packard, but IBM stays well ahead.
    ·      The Cray’s XT system series remains very popular for big research customers with four systems in the TOP10 (two new and two previously listed).
    Power consumption of supercomputers
    ·      TOP500 now tracks actual power consumption of supercomputers in a consistent fashion.
    ·      Only 25 systems on the list are confirmed to use more than 1 megawatt (MW) of power.
    ·      The No. 2 system Jaguar reports the highest total power consumption of 6.95 MW.
    ·      Average power consumption of a TOP500 system is 447 KW (up from 397 KW six months ago) and average power efficiency is 219 Mflops/watt (up from 195 Mflops/watt six month ago).
    ·      Average power consumption of a TOP10 system is 3.2 MW  (up from 2.89 MW six months ago) and average power efficiency is 268 Mflops/watt down from 300 Mflops/watt six months ago.
    ·      Most energy efficient supercomputers are based on:
    -       BlueGene/Q Prototype with 1680 Mflop/watt
    -       Fujitsu K-Computer at Riken with 829 Mflop/watt
    -       QPace Clusters based on IBM PowerXCell 8i processor blades in Germany (up to 774 Mflop/watt)
    Highlights from the Top 10:
    ·      The new Chinese Tianhe-1A system is the new No. 1 on the TOP500 and clearly in the lead with 2.57 petaflop/s performance.
    ·      The TOP10 features five new systems, four of which show more than one petaflop/s Linpack performance which bring the total number of petaflops systems up to seven.
    ·      The Chinese Nebulae system, which had its debut in the TOP500 only six months ago is at No. 3 and the second Chinese system in the TOP10.
    ·      Tsubame 2.0 is new, coming in at No. 4.
    ·      At No. 5 is a new Cray XE6 system installed at the National Energy Research Scientific Computing Center (NERSC) at the Lawrence Berkeley National Laboratory (LBNL). This is the third U.S. system ever to break the petaflop/s barrier after RoadRunner (No.7) and Jaguar (No. 2).
    ·      New to the list are Hopper (No. 5) and Cielo (No. 10), both Cray machines.
    ·      The other new system is at CEA in France (No. 6).
    ·      The U.S. only has five systems in the TOP10, Nos. 2, 5, 7, , and 10. The others are in China, Japan, France and Germany.
    General highlights from the TOP500 since the last edition:
    ·      Already 95 systems are using processors with 6 or more cores. Quad-core processor-based systems still dominate the TOP500, as 365 systems are using them and 37 systems are still using dual-core processors.
    ·      The entry level to the list moved up to the 31.1 Tflop/s mark on the Linpack benchmark, compared to 24.7 Tflop/s six months ago.
    ·      The last system on the newest list was listed at position 305 in the previous TOP500 just six months ago. This turnover rate is about average after the rather low replacement rate six months ago.
    ·      Total combined performance of all 500 systems has grown to 44.2 Pflop/s, compared to 32.4 Pflop/s six months ago and 27.6 Pflop/s one year ago.
    ·      The entry point for the TOP100 increased in six months from 52.84 Tflop/s to 75.76 Tflop/s.
    ·      The average concurrency level in the TOP500 is 13,071 cores per system, up from 10,267 six months ago and 9,174 one year ago.
    ·      A total of 398 systems (79.6 percent) are now using Intel processors. This is slightly down from six months ago (406 systems, 81.2 percent). Intel continues to provide the processors for the largest share of TOP500 systems.
    ·      They are now followed by the AMD Opteron family with 57 systems (11.4 percent), up from 47.
    ·      The share of IBM Power processors is slowly declining, now accounting for 40 systems (8.0 percent), down from 42.
    ·      17 systems use GPUs as accelerators, 6 of these use Cell processors, ten use NVIDIA chips and one uses ATI Radeon.
    ·      Gigabit Ethernet is still the most-used internal system interconnect technology (227 systems, down from 244 systems), due to its widespread use at industrial customers, followed by InfiniBand technology with 214 systems, up from 205 systems.
    ·      However, InfiniBand-based systems account for two and a half times as much performance (20.4 Pflop/s) than Gigabit Ethernet ones (8.7 Pflop/s).
    ·      IBM and Hewlett-Packard continue to sell the bulk of the systems at all performance levels of the TOP500.
    ·      IBM kept its lead in systems and has now 200 systems (40 percent) compared to HP with 158 systems (31.6 percent). HP had 185 systems (37 percent) six months ago, compared to IBM with 198 systems (39.8 percent).
    ·      IBM remains the clear leader in the TOP500 list in performance with 27.4 percent of installed total performance (down from 33.6 percent). HP lost the second place in this category to Cray. HP went down to 15.6 percent from 20.4 percent, while Cray increased to 19.1 percent from 14.8 percent.
    ·      In the system category, Cray, SGI, and Dell follow with 5.8 percent, 4.4 percent and 4.0 percent respectively.
    ·      In the performance category, the manufacturers with more than 5 percent are: NUDT which engineered the Nos.1 and 12 systems (7.1 percent of performance) and SGI (5.7 percent).
    ·      HP (137) and IBM (136) together sold 273 out of 281 systems at commercial and industrial customers and have had this important market segment clearly cornered for some time now.
    ·      The U.S. is clearly the leading consumer of HPC systems with 274 of the 500 systems (down from 282). The European share (125 systems – down from 144) is still substantially larger than the Asian share (84 systems – up from 57).
    ·      Dominant countries in Asia are China with 41 systems (up from 24), Japan with 26 systems (up from 18), and India with 4 systems (down from five).
    ·      In Europe, Germany and France caught up with the UK. UK dropped from the No. 1 position with now 24 systems (38 six months ago). France and Germany passed the UK and have now 26 each (up from 29 and up from 24 systems six month ago).
    Highlights from the TOP50:
    ·      The entry level into the TOP50 is at 126.5 Tflop/s
    ·      The U.S. has a similar percentage of systems (50 percent) in the TOP50 than in the TOP500 (54.8 percent).
    ·      China is already following with five systems (10 percent).
    ·      Cray has passed IBM and now leads the TOP50 with 34 percent of systems and 33 percent of performance.
    ·      No. 2 is now IBM with a share of 18 percent of systems and 17 percent of performance.
    ·      66 percent of systems are installed at research labs and 22 percent at universities.
    ·      There is only a single system using Gigabit Ethernet in the TOP50.
    ·      The average concurrency level is 64,618 cores per system – up from 49,080 cores per system six months ago and 44,338 one year ago.
    All changes are from June 2010 to November 2010.
    About the TOP500 List
    The TOP500 list is compiled by Hans Meuer of the University of Mannheim, Germany; Erich Strohmaier and Horst Simon of NERSC/Lawrence Berkeley National Laboratory; and Jack Dongarra of the University of Tennessee, Knoxville. For more information, visit
    Copyright (c) 2000-2009 TOP500.Org | All trademarks and copyrights on this page are owned by their respective owners.
  • SESSION: Plenary and Kennedy Award Speakers
    EVENT TYPE: Invited Speaker
    TIME: 8:30AM - 9:15AM
    Speaker(s):Bill Dally
    ABSTRACT: Performance per Watt is the new performance. In today’s power-limited regime, GPU Computing offers significant advantages in performance and energy efficiency. In this regime, performance derives from parallelism and efficiency derives from locality. Current GPUs provide both, with up to 512 cores per chip and an explicitly-managed memory hierarchy. This talk will review the current state of GPU computing and discuss how we plan to address the challenges of ExaScale computing. Achieving ExaFLOPS of sustained performance in a 20MW power envelope requires significant power reduction beyond what will be provided by technology scaling. Efficient processor design along with aggressive exploitation of locality is expected to address this power challenge. A focus on vertical rather than horizontal locality simplifies many issues including load balance, placement, and dynamic workloads. Efficient mechanisms for communication, synchronization, and thread management will be required to achieve the strong scaling required to achieve the 1010-thread parallelism needed to sustain an ExaFLOPS on reasonable-sized problems. Resilience will be achieved through a combination of hardware mechanisms and an API that allows programs to specify when and where protection is required. Programming systems will evolve to improve programmer productivity with a global address space and global data abstractions while improving efficiency via machine independent abstractions for locality.
    Speaker Details:
    Bill Dally - NVIDIA/Stanford University
    My notes:
    CUDA is the most popular language these days.
    Fermi -> Kepler -> Maxwell
    Core i3 + c2060  best GFLOPS/W
    DARPA Exascale study (download PDF)
    Today: 5GFLOPs/W
    Exascale: 50GFLOPs/W
    So we need to improve by a factor of 10. this is achievable: x4 by architecture change and an additional x4 by technological change
  • CAPS – by HMPP
  • Scint – 100GB
  • IBM 94TF in one rack
  • Sc10 slide share

    1. 1. SC10 Guy Tel-Zur November 2010
    2. 2. My Own Diary • A subjective impression from SC10
    3. 3. Outline • The Tutorials • Plenary Talks • Papers & Panels • The Top500 list • The Exhibition
    4. 4. Day 0 - Arrival US Airways entertainment system is running Linux!
    5. 5. A lecture by Prof. Rubin Landau Computational Physics at the Educational Track 3:30PM - 5:00PM Communities, Education Physics: Examples in Computational Physics, Part 2 Physics: Examples in Computational Physics, Part 2 Rubin Landau 297 Although physics faculty are incorporating computers to enhance physics education, computation is often viewed as a black box whose inner workings need not be understood. We propose to open up the computational black box by providing Computational Physics (CP) curricula materials based on a problem-solving paradigm that can be incorporated into existing physics classes, or used in stand-alone CP classes. The curricula materials assume a computational science point of view, where understanding of the applied math and the CS is also important, and usually involve a compiled language in order for the students to get closer to the algorithms. The materials derive from a new CP eTextbook available from Compadre that includes video-based lectures, programs, applets, visualizations and animations.
    6. 6. Eclipse PTP At last FORTRAN has an advanced, free, IDE !!! PTP - Parallel Tools Platform
    7. 7. Elastic-R
    8. 8. Visit
    9. 9. Python for Scientific Computing
    10. 10. Amazon Cluster GPU Instances provide 22 GB of memory, 33.5 EC2 Compute Units, and utilize the Amazon EC2 Cluster network, which provides high throughput and low latency for High Performance Computing (HPC) and data intensive applications. Each GPU instance features two NVIDIA Tesla® M2050 GPUs, delivering peak performance of more than one trillion double-precision FLOPS. Many workloads can be greatly accelerated by taking advantage of the parallel processing power of hundreds of cores in the new GPU instances. Many industries including oil and gas exploration, graphics rendering and engineering design are using GPU processors to improve the performance of their critical applications. Amazon Cluster GPU Instances extend the options for running HPC workloads in the AWS cloud. Cluster Compute Instances, launched earlier this year, provide the ability to create clusters of instances connected by a low latency, high throughput network. Cluster GPU Instances give customers with HPC workloads an additional option to further customize their high performance clusters in the cloud. For those customers who have applications that can benefit from the parallel computing power of GPUs, Amazon Cluster GPU Instances can often lead to even further efficiency gains over what can be achieved with traditional processors. By leveraging both instance types, HPC customers can tailor their compute cluster to best meet the performance needs of their workloads. For more information on HPC capabilities provided by Amazon EC2, visit Amazon Cluster GPU Instances
    11. 11. The Top500
    12. 12. World’s #1 China's National University of Defense Technology's Tianhe-1A supercomputer has taken the top ranking from Oak Ridge National Laboratory's Jaguar supercomputer on the latest Top500 ranking of the world's fastest supercomputers. The Tianhe-1A achieved a performance level of 2.67 petaflops per second, while Jaguar achieved 1.75 petaflops per second. The Nebulae, another Chinese- built supercomputer, came in third with a performance of 1.27 petaflops per second. "What the Chinese have done is they're exploiting the power of [graphics processing units], which are...awfully close to being uniquely suited to this particular benchmark," says University of Illinois Urbana-Champagne professor Bill Gropp. Tianhe-1A is a Linux computer built from components from Intel and NVIDIA. "What we should be focusing on is not losing our leadership and being able to apply computing to a broad range of science and engineering problems," Gropp says. Overall, China had five supercomputers ranked in the top 100, while 42 of the top 100 computers were U.S. systems.
    13. 13. The Top 10
    14. 14.
    15. 15. Talks
    16. 16. SC10 Keynote Lecture Clayton M. Christensen - Harvard Business School
    17. 17. Disruption is the mechanism by which great companies continue to succeed and new entrants displace the market leaders. Disruptive innovations either create new markets or reshape existing markets by delivering relatively simple, convenient, low cost innovations to a set of customers who are ignored by industry leaders. One of the bedrock principles of Christensen's disruptive innovation theory is that companies innovate faster than customers' lives change. Because of this, most organizations end up producing products that are too good, too expensive, and too inconvenient for many customers. By only pursuing these "sustaining" innovations, companies unwittingly open the door to "disruptive" innovations, be it "low-end disruption" targeting overshot-less-demanding customers or "new-market disruption", targeting non-consumers. 1. Many of today’s markets that appear to have little growth remaining, actually have great growth potential through disruptive innovations that transform complicated, expensive products into simple, affordable ones. 2. Successful innovation seems unpredictable because innovators rely excessively on data, which is only available about the past. They have not been equipped with sound theories that do not allow them to see the future perceptively. This problem has been solved. 3. Understanding the customer is the wrong unit of analysis for successful innovation. Understanding the job that the customer is trying to do is the key. 4. Many innovations that have extraordinary growth potential fail, not because of the product or service itself, but because the company forced it into an inappropriate business model instead of creating a new optimal one. 5. Companies with disruptive products and business models are the ones whose share prices increase faster than the market over sustained periods How to Create New Growth in a Risk- Minimizing Environment
    18. 18. SC10 Keynote Speaker
    19. 19. High-End Computing and Climate Modeling: Future Trends and Prospects SESSION: Big Science, Big Data II Presenter(s):Phillip Colella ABSTRACT: Over the past few years, there has been considerable discussion of the change in high-end computing, due to the change in the way increased processor performance will be obtained: heterogeneous processors with more cores per chip, deeper and more complex memory and communications hierarchies, and fewer bytes per flop. At the same time, the aggregate floating-point performance at the high end will continue to increase, to the point that we can expect exascale machines by the end of the decade. In this talk, we will discuss some of the consequences of these trends for scientific applications from a mathematical algorithm and software standpoint. We will use the specific example of climate modeling as a focus, based on discussions that have been going on in that community for the past two years. Chair/Presenter Details: Patricia Kovatch (Chair) - University of Tennessee, Knoxville Phillip Colella - Lawrence Berkeley National Laboratory
    20. 20. Prediction of Earthquake Ground Motions Using Large-Scale Numerical Simulations SESSION: Big Science, Big Data II Presenter(s):Tom Jordan ABSTRACT: Realistic earthquake simulations can now predict strong ground motions from the largest anticipated fault ruptures. Olsen et al. (this meeting) have simulated a M8 “wall-to-wall” earthquake on southern San Andreas fault up to 2-Hz, sustaining 220 teraflops for 24 hours on 223K cores of NCCS Jaguar. Large simulation ensembles (~10^6) have been combined with probabilistic rupture forecasts to create CyberShake, a physics-based hazard model for Southern California. In the highly-populated sedimentary basins, CyberShake predicts long-period shaking intensities substantially higher than empirical models, primarily due to the strong coupling between rupture directivity and basin excitation. Simulations are improving operational earthquake forecasting, which provides short-term earthquake probabilities using seismic triggering models, and earthquake early warning, which attempts to predict imminent shaking during an event. These applications offer new and urgent computational challenges, including requirements for robust, on-demand supercomputing and rapid access to very large data sets.
    21. 21. Panel
    22. 22. Exascale Computing Will (Won't) Be Used by Scientists by the End of This Decade EVENT TYPE: Panel Panelists:Marc Snir, William Gropp, Peter Kogge, Burton Smith, Horst Simon, Bob Lucas, Allan Snavely, Steve Wallach ABSTRACT: DOE has set a goal of Exascale performance by 2018. While not impossible, this will require radical innovations. A contrarian view may hold that technical obstacles, cost, limited need, and inadequate policies will delay exascale well beyond 2018. The magnitude of the required investments will lead to a public discussion for which we need to be well prepared. We propose to have a public debate on the proposition "Exascale computing will be used by the end of the decade", with one team arguing in favor and another team arguing against. The arguments should consider technical and non-technical obstacles and use cases. The proposed format is: (a) introductory statements by each team (b) Q&A's where each team can put questions to other team (c) Q&A's from the public to either teams. We shall push to have a lively debate that is not only informative, but also entertaining.
    23. 23. GPU Computing: To ExaScale and Beyond Bill Dally - NVIDIA/Stanford University
    24. 24. Dedicated High-End Computing to Revolutionize Climate Modeling: An International Collaboration ABSTRACT: A collaboration of six institutions on three continents is investigating the use of dedicated HPC resources for global climate modeling. Two types of experiments were run using the entire 18,048-core Cray XT-4 at NICS from October 2009 to March 2010: (1) an experimental version of the ECMWF Integrated Forecast System, run at several resolutions down to 10 km grid spacing to evaluate high-impact and extreme events; and (2) the NICAM global atmospheric model from JAMSTEC, run at 7 km grid resolution to simulate the boreal summer climate, over many years. The numerical experiments sought to determine whether increasing weather and climate model resolution to accurately resolve mesoscale phenomena in the atmosphere can improve the model fidelity in simulating the mean climate and the distribution of variances and covariances. Chair/Presenter Details: Robert Jacob (Chair) - Argonne National Laboratory James Kinter - Institute of Global Environment and Society
    25. 25. Using GPUs for Weather and Climate Models Presenter(s):Mark Govett ABSTRACT: With the power, cooling, space, and performance restrictions facing large CPU- based systems, graphics processing units (GPUs) appear poised to become the next-generation super-computers. GPU-based systems already are two of the top ten fastest supercomputers on the Top500 list, with the potential to dominate this list in the future. While the hardware is highly scalable, achieving good parallel performance can be challenging. Language translation, code conversion and adaption, and performance optimization will be required. This presentation will survey existing efforts to use GPUs for weather and climate applications. Two general parallelization approaches will be discussed. The most common approach is to run select routines on the GPU but requires data transfers between CPU and GPU. Another approach is to run everything on the GPU and avoid the data transfers, but this can require significant effort to parallelize and optimize the code.
    26. 26. Global Arrays Global Arrays Roadmap and Future Developments SESSION: Global Arrays: Past, Present & Future EVENT TYPE: Special and Invited Events SESSION CHAIR: Moe Khaleel Speaker(s):Daniel Chavarria ABSTRACT: This talk will describe the current state of the Global Arrays toolkit and its underlying ARMCI communication layer and how we believe they should evolve over the next few years. The research and development agenda is targeting expected architectural features and configurations on emerging extreme-scale and exascale systems. Speaker Details: Moe Khaleel (Chair) - Pacific Northwest National Laboratory Daniel Chavarria - Pacific Northwest National Laboratory
    27. 27.
    28. 28. Enabling High Performance Cloud Computing Environments SESSION LEADER(S):Jurrie Van Den Breekel ABSTRACT: The cloud is the new “killer” service to bring service providers and enterprises into the age of network services capable of infinite scale. As an example, 5,000 servers with many cloud services could feasibly serve one billion users or end devices. The idea of services at this scale is now possible with multi-core processing, virtualization and high speed Ethernet, but even today the mix of implementing these technologies requires careful considerations in public and private infrastructure design. While cloud computing offers tremendous possibilities, it is critical to understanding the limitations of this framework across key network attributes such as performance, security, availability and scalability. Real-world testing of a cloud computing environment is a key step toward putting any concerns to rest around performance, security and availability. Spirent will share key findings that are the result of some recent work with the European Advanced Networking Test Center (EANTC) including a close examination of how implementing a cloud approach within a private or private data center affects the firewall, data center bridging, virtualization, and WAN optimization. Session Leader Details: Jurrie Van Den Breekel (Primary Session Leader) - Spirent Communications
    29. 29. Cont’ Speakers: NEOVISE – Paul Burns SPIRENT – Jurrie van den Breekel BROCADE – Steve Smith Paul: Single application – single server Single application – multiple servers (cluster computing) Multiple applications – single sever – virtualization Multiple applications – multiple servers – Cloud computing 3rd dimension : tenants, T1 T2 on the same physical server - security
    30. 30. Friday Panels (19-11-2010)
    31. 31. Future Supercomputing Centers Thom Dunning, William Gropp, Thomas Lippert, Satoshi Matsuoka, Thomas Zacharia This panel will discuss the nature of federal- and state-supported supercomputing centers, what is required to sustain them in the future, and how they will cope with the evolution of computing technology. Since the federally supported centers were created in the mid-1980s, they have fueled innovation and discovery, increasing the number of computational researchers, stimulating the use of HPC in industry, and pioneering new technologies. The future of supercomputing is exciting—sustained petascale systems are here with planning for exascale systems now underway—but it also challenging— disruptive technology changes will be needed to reach the exascale. How can supercomputing help ensure that today’s petascale supercomputer are effectively used to advance science and engineering and how can they help the research and industrial communities prepare for an exciting, if uncertain future?
    32. 32. Advanced HPC Execution Models: Innovation or Disruption Panelists:Thomas L. Sterling, William Carlson, Guang Gao, William Gropp, Vivek Sarkar, Thomas Sterling, Kathy Yelick ABSTRACT: An execution model is the underlying conceptual foundation that integrates the HPC system architecture, programming methods, and intervening Operating System and runtime system software. It is a set of governing principles that govern the co-design, operation, and interoperability of the system layers to achieve most efficient scalable computing in terms of time and energy. Historically, HPC has been driven by five previous epochs of execution models including the most recent CSP that has been exemplified by "Pax MPI" for almost two decades. HPC is now confronted by a severe barrier of parallelism, power, clock rate, and complexity exemplified by multicore and GPU heterogeneity impeding progress between today's Petascale and the end of the decade's Exascale performance. The panel will address the key questions of requirements, form, impact, and programming of such future execution models should they emerge from research in academia, industry, and government centers.
    33. 33. The Exhibition