• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
SC13 Diary
 

SC13 Diary

on

  • 1,024 views

My Supercomputing 2013 Conference Diary

My Supercomputing 2013 Conference Diary

Statistics

Views

Total Views
1,024
Views on SlideShare
1,020
Embed Views
4

Actions

Likes
0
Downloads
0
Comments
0

2 Embeds 4

http://www.linkedin.com 3
https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • http://sc13.supercomputing.org/schedule/event_detail.php?evid=pec113
  • http://serc.carleton.edu/csinparallel/workshops/sc13/index.html
  • Abstract: Higher levels of abstractions that increase productivity can be designed by specializing them in specific ways. Domain-specific languages, interaction pattern specific languages, APGAS languages, or high-level frameworks leverage their own specializations to raise abstraction levels and increase productivity. In this talk, I will present some common support that all such higher level abstractions need, and the need to encapsulate that support in a single common substrate. In particular, the support includes automatic resource management, and other runtime adaptation support, including that for tolerating component failures or handling power/energy issues. Further, I will explore the need to interoperate and coordinate across multiple such paradigms, so that one can construct multi-paradigm applications with ease. I will illustrate the talk with my group's experience in designing multiple interaction-pattern-specific HLLs, and on interoperability among them as well with traditional message-passing paradigm of MPI. HLL=High Level LanguageHLPS=High Level Programming Systems“Is there life beyond MPI?”
  • Chris JohnsonSCI Institute, University of UtahSalt Lake City, UT, United States 84112Websitehttp://www.sci.utah.edu
  • ABSTRACT:The partitioned global address space (PGAS) programming model strikes a balance between the ease of programming due to its global address memory model and performance due to locality awareness. While developed for scalable systems, PGAS is gaining popularity due to the NUMA memory architectures on many-core chips. Some PGAS implementations include Co-Array Fortran, Chapel, UPC, X10, Phalanx, OpenShmem, Titanium and Habanero. PGAS concepts are influencing new architectural designs and are being incorporated into traditional HPC environments. This BOF will bring together developers, researchers and users for the exchange of ideas and information and to address common issues of concern.SESSION LEADER DETAILS:Tarek El-Ghazawi (Primary Session Leader) - George Washington UniversityLauren Smith (Secondary Session Leader) - US Government
  • ABSTRACT:Map-reduce, the cornerstone computational framework for cloud computing applications, has star appeal to draw students to the study of parallelism. Participants will carry out hands-on exercises designed for students at CS1/intermediate/advanced levels that introduce data-intensive scalable computing concepts, using WebMapReduce (WMR), a simplified open-source interface to the widely used Hadoop map-reduce programming environment, and using Hadoop itself. These hands-on exercises enable students to perform data-intensive scalable computations carried out on the most widely deployed map-reduce framework, used by Facebook, Microsoft, Yahoo, and other companies. WMR supports programming in a choice of languages (including Java, Python, C++, C#, Scheme); participants will be able to try exercises with languages of their choice. Workshop includes brief introduction to direct Hadoop programming, and information about access to cluster resources supporting WMR. Workshop materials will reside on csinparallel.org, along with WMR software. Intended audience: CS instructors. Laptop required (Windows, Mac, or Linux).
  • Energy aware, both Cray and Bull. IPMI and SensorsFlexLM awareHadoop awareHDF5 aware
  • The Omega Project and Constraint Based Analysis Techniques in High Performance ComputingWilliam PughProfessor Emeritus of Computer Science at the University of Maryland at College ParkThe Omega test paper was one of the first to suggest general use of an exact algorithm for array data dependence analysis, which is the problem of determining if two array references are aliased. Knowing this is essential to knowing which loops can be run in parallel. Array data dependence is essentially the problem of determining if a set of affine constraints have an integer solution. This problem is NP-complete, but the paper described an algorithm that was both fast in practice and always exact. More important than the fact that the Omega test was exact was that it also could use arbitrary affine constraints (as opposed to many existing algorithms which could only use constraints occurring in certain pre-defined patterns), and could produce symbolic answers, rather than just yes/no answers. This work was the foundation of the Omega project and library, which significantly expanded the capabilities of the Omega test and added to the range of problems and domains that it could be applied to. The Omega library could calculate things such as actual data flow (rather than just aliasing), analyze and represent loop transformations, calculate array sections that needed to be communicated and generate loop nests. This talk will describe both the Omega test, the context in which the paper was originally written, the Omega project and the field of constraint-based program analysis and transformation that it helped open up. http://sc13.supercomputing.org/content/omega-project-and-constraint-based-analysis-techniques-high-performance-computinghttp://www.cs.umd.edu/projects/omega/* Find Bug – Static analysis of Java codes
  • http://insidehpc.com/2013/11/22/sc13-awards-roundup-students-trounce-old-timers-celebrity-pro/
  • http://sc13.supercomputing.org/content/student-cluster-competition
  • https://share.sandia.gov/news/resources/news_releases/computer_test/#.UpDJdMRkOSp
  • The Green500 List - November 2013The November 2013 release of the Green500 list was announced today at the SC|13 conference in Denver, Colorado, USA. Continuing the trend from previous years, heterogeneous supercomputing systems totally dominates the top 10 spots of the Green500. A heterogeneous system uses computational building blocks that consist of two or more types of “computing brains.” These types of computing brains include traditional processors (CPUs), graphics processing units (GPUs), and co-processors. In this edition of the Green500, one system smashes through the 4-billion floating-point operation per second (gigaflops) per watt barrier.TSUBAME-KFC, a heterogeneous supercomputing system developed at the Tokyo Institute of Technology (TITech) in Japan, tops the list with an efficiency of 4.5 gigaflops/watt. Each computational node within TSUBAME-KFC consists of two Intel Ivy Bridge processors and four NVIDIA Kepler GPUs. In fact, all systems in the top ten of the Green500 use a similar architecture, i.e., Intel CPUs combined with NVIDIA GPUs. Wilkes, a supercomputer housed at Cambridge University, takes the second spot. The third position is filled by the HA-PACS TCA system at the University of Tsukuba. Of particular note, this list also sees two petaflop systems, each capable of computing over one quadrillion operations per second, achieve an efficiency of over 3 gigaflops/watt, namely Piz Daint at Swiss National Supercomputing Center and TSUBAME 2.5 at Tokyo Institute of Technology. Thus, Piz Daint is the greenest petaflop supercomputer on the Green500. As a point of reference, Tianhe-2, the fastest supercomputer in the world according to the Top500 list, achieves an efficiency of 1.9 gigaflops/watt.This list marks a number of “firsts” for the Green500. It is the first time that a supercomputer has broken through the 4 gigaflops/watt barrier. Second, it is first time that all of the top 10 systems on the Green500 are heterogeneous systems. Third, it is the first time that the average of the measured power consumed by the systems on the Green500 dropped with respect to the previous edition of the list. “A decrease in the average measured power coupled with an overall increase in performance is an encouraging step along the trail to exascale,” noted Wu Feng of the Green500. Fourth, assuming that TSUBAME-KFC’s energy efficiency can be maintained for an exaflop system, it is the first time that an extrapolation to an exaflop supercomputer has dropped below 300 megawatts (MW), specifically 222 MW. “This 222-MW power envelope is still a long way away from DARPA’s target of an exaflop system in the 20-MW power envelope,” says Feng.Starting with this release, the Little Green500 list only includes machines with power values submitted directly to the Green500. In fact, there are more than 400 systems that have submitted directly to the Green500 over the past few years. As in previous years, the Little Green500 list has better overall efficiency than the Green500 list on average.Earlier this year, the Green500 adopted new methodologies for measuring the power of supercomputing systems and providing a more accurate representation of the energy efficiency of large-scale systems. In June 2013, the Green500 formally adopted measurement rules (a.k.a. “Level 1” measurements), developed in cooperation with the Energy-Efficient High-Performance Computing Working Group (EE HPC WG). Moreover, power-measurement methodologies with higher precision and accuracy were developed as a part of this effort (a.k.a. “Level 2” and “Level 3” measurements). With growing support and interest in the energy efficiency of large-scale computing systems, the Green500 is welcoming two more submissions at Level 2 and Level 3 than in the previous edition of the Green500 list. Of particular note, Piz Daint, the greenest petaflop supercomputer in the world, submitted the highest-quality Level 3 measurement.
  • התלמידים מרכיבים מערכת מבוססת מעבד אטום ואחכ מתקינים תוכנות ענן
  • Amanda from Brocade
  • http://www.pcworld.com/article/2064480/nvidia-boosts-supercomputing-speed-with-tesla-k40-graphics-chip.html
  • blog.cyclecomputing.com/2013/11/back-to-the-future-121-petaflopsrpeak-156000-core-cyclecloud-hpc-runs-264-years-of-materials-science.html
  • http://ucsdnews.ucsd.edu/pressrelease/sdsc_uses_meteor_raspberry_pi_cluster_to_teach_parallel_computing
  • http://cseweb.ucsd.edu/~mbtaylor/papers/taylor_landscape_ds_ieee_micro_2013.pdfMulticore scaling leads to large amounts of dark silicon.3 Across two process generations,there is a spectrum of trade-offs between frequency and core count; these includeincreasing core count by 2 but leaving frequency constant (top), and increasing frequencyby 2 but leaving core count constant (bottom). Any of these trade-off points will havelarge amounts of dark silicon.
  • 1 RSC special rack == 8 standard x86 racks

SC13 Diary SC13 Diary Presentation Transcript

  • My SC13 * Diary Guy Tel-Zur * A very subjective review
  • SC13 The 25th anniversary is shaping up to be: SCinet is provisioning over 1 Terabit per second of bandwidth; we have 26 conference rooms and 2 ballrooms of technical program papers, tutorials, panels, workshops, and posters; and this year's exhibit features over 350 of the HPC community's leading government, academic, and industry organizations..
  • Sunday 17-Nov-2013 • Workshops – 4th Intl Workshop on Data-Intensive Computing in the Clouds – 4th Workshop on Petascale (Big) Data Analytics • Education – LittleFe Buildout, http://littlefe.net/ – Curriculum Workshop: Mapping CS2013 & NSF/TCPP
  • Monday 18-Nov-2013 • Workshops • Education
  • Education Perspectives on Broadening Engagement and Educations in the context of Advanced Computing: Irene Qualters NSF Program Responsibilities: - Cyber-Enabled Sustainability Science and Engineering (CyberSEES) - High Performance Computing System Acquisition - Interdisciplinary Research in Hazards and Disasters - Petascale Computing Resource Allocations
  • EduPDHPC • Workshop: http://cs.gsu.edu/~tcpp/curriculum/?q=edupdhpc • Program: http://cs.gsu.edu/~tcpp/curriculum/?q=EduPDHPC13_Technical_Program • Talks I attended: – A Curricular Experience With Parallel Computational Thinking: A Four Years Journey Edusmildo Orozco et. al. – …and see next two slides
  • Teaching parallel programming to undergrads with hands-on experience Rainer Keller, Hochschule fuer Technik Stuttgart -- University of Applied Science, Germany
  • Mapping CS2013 and NSF/TCPP parallel and distributed computing recommendations and resources to courses http://serc.carleton.edu/csinparallel/workshops/sc13/index.html
  • http://cs.gsu.edu/~tcpp/curriculum/sites/default/files/xsede_overview.pdf
  • Python for High Performance and Scientific Computing (PyHPC 2013) Talks I attended: – NumFOCUS: A Foundation Supporting Open Scientific Software – Synergia: Driving Massively Parallel Particle Accelerator Simulations with Python – Compiling Python Modules to Native Parallel Modules Using Pythran and OpenMP Annotations
  • Python for High Performance and Scientific Computing (PyHPC 2013) Links: • PyHPC 2013 on Facebook: https://www.facebook.com/events/17938399 8878612/ • PyHPC 2013 Slides: www.dlr.de/sc/en/desktopdefault.aspx/tabid9001/15542_read-38237/ • PyHPC: http://pyhpc.org • NumFocus: http://numfocus.org
  • WOLFHPC: Workshop on Domain-Specific Languages and HighLevel Frameworks for HPC • http://hpc.pnl.gov/conf/wolf hpc/2013/ • Keynote Speaker: Laxmikant Kale, University of Illinois, Urbana-Champaign http://charm.cs.illinois.edu What Parallel HLLs Need Charm++
  • Tuesday 19-Nov-2013 • Conference Opening • Awards • Invited Keynote
  • Invited Keynote Genevieve Bell - The Secret Life of Data Today Big Data is one of the hottest buzzwords in technology, but from an anthropological perspective Big Data has been with us for millennia, in forms such as census information collected by ancient civilizations. The next 10 years will be shaped primarily by new algorithms that make sense of massive and diverse datasets and discover hidden value. Could we ignite our creativity by looking at data from a fresh perspective? What if we designed for data like we design for people? This talk will explore the secret life of data from an anthropological point of view to allow us to better understand its needs -- and its astonishing potential -- as we embark to create a new data society.
  • Wednesday 20-Nov-2013 • My agenda – 2 keynotes – 2 invited speakers – PGAS BoF – Education Map-Reduce – The Exhibition
  • Warren Washington: Climate Earth System Modeling for the IPCC Sixth Assessment Report (AR6): Higher Resolution and Complexity
  • Saul Perlmutter: Data, Computation, and the Fate of the Universe
  • Invited Talks • The Transformative Impact of Computation in a Data-Driven World • Europe's Place in a Global Race
  • PGAS BoF
  • CSinParallel: Using Map-Reduce to Teach Data-Intensive Scalable Computing Across the CS Curriculum http://csinparallel.org http://serc.carleton.edu/csinparallel/workshops/sc13wmr/index.html Dick Brown, St. Olaf College
  • Thursday 21-Nov-2013 • Snow storm • SLURM BoF Next SLURM users meeting: 23-24/9/2014 @ Swiss National Supercomputing Center, Switzerland
  • ACM Athena Lecturer Award The ACM Athena Lecturer Award celebrates women researchers who have made fundamental contributions to Computer Science. Sponsored by the ACM, the award includes a $10000 honorarium. This year’s ACM Athena Lecturer Award winner is Katherine Yelick, Professor of Electrical Engineering and Computer Sciences, University of California, Berkeley and Associate Lab Director for Computing Sciences, Lawrence Berkeley National Laboratory.
  • The SC Test of Time Award The SC Test of Time Award recognizes a seminal technical paper from past SC conferences that has transformed high performance computing, storage, or networking. The inaugural winner of the SC Test of Time Award is William Pugh, emeritus Professor of Computer Science at the University of Maryland at College Park.
  • Bill Pugh
  • SC13 Posters
  • Awards • The Best Paper Award went to “Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One Sided,” written by Robert Gerstenberger, University of Illinois at UrbanaChampaign, and Maciej Besta and Torsten Hoefler, both of ETH Zurich. • The Best Student Paper Award was given to “Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?” written by Nikola Rajovic of the Barcelona Supercomputing Center. • The ACM Gordon Bell Prize for best performance of a high performance application went to “11 PFLOP/s Simulations of Cloud Cavitation Collapse,” by Diego Rossinelli, Babak Hejazialhosseini, Panagiotis Hadjidoukas and Petros Koumoutsakos, all of ETH Zurich, Costas Bekas and Alessandro Curioni of IBM Zurich Research Laboratory, and Steffen Schmidt and Nikolaus Adams of Technical University Munich.
  • • The Best Poster Award was presented to “Optimizations of a Spectral/Finite Difference Gyrokinetic Code for Improved Strong Scaling Toward Million Cores,” by Shinya Maeyama, Yasuhiro Idomura and Motoki Nakata of the Japan Atomic Energy Agency and Tomohiko Watanabe, Masanori Nunami and Akihiro Ishizawa of the National Institute for Fusion Science. • The inaugural SC Test of Time Award was presented to William Pugh from the University of Maryland for his seminal paper, “The Omega Test: a fast and practical integer programming algorithm for dependence analysis,” published in the proceedings of Supercomputing ’91.
  • The 2013-2014 ACM Athena Lecturer, Katherine Yelick of Lawrence Berkeley National Laboratory and the University of California, was recognized during the conference keynote session and presented her lecture during the conference. FLOPs/Dolar The Student Cluster Commodity Track competition, teams were allowed to spend no more than $2,500 and the cluster must have a 15-amp power limit. The overall winning team of the Commodity Track was from Bentley University, Waltham, Massachusetts; and Northeastern University, Boston.
  • The November 2013 Top500 The total combined performance of all 500 systems on the list is 250 Pflop/s. Half of the total performance is achieved by the top 17 systems on the list, with the other half of total performance spread among the remaining 483 systems.
  • • In all, there are 31 systems with performance greater than a petaflop/s on the list, an increase of five compared to the June 2013 list. • The No. 1 system, Tianhe-2, and the No. 7 system, Stampede, are using Intel Xeon Phi processors to speed up their computational rate. The No. 2 system Titan and the No. 6 system Piz Daint are using NVIDIA GPUs to accelerate computation. • A total of 53 systems on the list are using accelerator/co-processor technology, unchanged from June 2013. Thirty-eight (38) of these use NVIDIA chips, two use ATI Radeon, and there are now 13 systems with Intel MIC technology (Xeon Phi). • Intel continues to provide the processors for the largest share (82.4 percent) of TOP500 systems. • Ninety-four percent of the systems use processors with six or more cores and 75 percent have processors with eight or more cores. • The number of systems installed in China has now stabilized at 63, compared with 65 on the last list. China occupies the No. 2 position as a user of HPC, behind the U.S. but ahead of Japan, UK, France, and Germany. Due to Tianhe-2, China this year also took the No. 2 position in the performance share, topping Japan. • The last system on the newest list was listed at position 363 in the previous TOP500.
  • A New Benchmark: Improved ranking test for supercomputers to be released by Sandia Sandia National Laboratories researcher Mike Heroux leads development of a new supercomputer benchmark High Performance Conjugate Gradient (HPCG) 4000 LOC http://mantevo.org/
  • http://green500.org/news/green500-list-november-2013 The Green 500 #1 TSUBAME-KFC-GSIC Center, Tokyo Institute of Technology #2 Wilkes-Cambridge University #3 HA-PACS TCA-Center for Computational Sciences, University of Tsukuba #4 Piz Daint-Swiss National Supercomputing Centre (CSCS) #5 romeo-ROMEO HPC Center - Champagne-Ardenne #6 TSUBAME 2.5-GSIC Center, Tokyo Institute of Technology #7 University of Arizona #8 Max-Planck-Gesellschaft MPI/IPP #9 Financial Institution #10 CSIRO GPU Cluster-CSIRO
  • Continuing the trend from previous years, heterogeneous supercomputing systems totally dominates the top 10 spots of the Green500. A heterogeneous system uses computational building blocks that consist of two or more types of “computing brains.” These types of computing brains include traditional processors (CPUs), graphics processing units (GPUs), and co-processors. In this edition of the Green500, one system smashes through the 4-billion floating-point operation per second (gigaflops) per watt barrier. TSUBAME-KFC, a heterogeneous supercomputing system developed at the Tokyo Institute of Technology (TITech) in Japan, tops the list with an efficiency of 4.5 gigaflops/watt. Each computational node within TSUBAME-KFC consists of two Intel Ivy Bridge processors and four NVIDIA Kepler GPUs. In fact, all systems in the top ten of the Green500 use a similar architecture, i.e., Intel CPUs combined with NVIDIA GPUs. Wilkes, a supercomputer housed at Cambridge University, takes the second spot. The third position is filled by the HA-PACS TCA system at the University of Tsukuba. Of particular note, this list also sees two petaflop systems, each capable of computing over one quadrillion operations per second, achieve an efficiency of over 3 gigaflops/watt, namely Piz Daint at Swiss National Supercomputing Center and TSUBAME 2.5 at Tokyo Institute of Technology. Thus, Piz Daint is the greenest petaflop supercomputer on the Green500. As a point of reference, Tianhe-2, the fastest supercomputer in the world according to the Top500 list, achieves an efficiency of 1.9 gigaflops/watt.
  • SC13 Exhibition
  • SciNet
  • Various new tools, products and other issues that I came across
  • OpenMP 4 • http://openmp.org/wp/openmp-40-api-at-sc13/ • “OpenMP 4.0 is a big step towards increasing user productivity for multi-and many-core programming”, says Dieter an Mey, Leader of the HPC Team at RWTH Aachen University. “Standardizing accelerator programming, adding task dependencies, SIMD support, cancellation, and NUMA awareness will make OpenMP an even more attractive parallel programming paradigm for a growing user community.” • “The latest OpenMP 4.0 release will provide our HPC users with a single language for offloading computational work to Xeon Phi coprocessors, NVIDIA GPUs, and ARM processors”, says Kent Milfeld, Manager, HPC Performance & Architecture Group of the Texas Advanced Computing Center. “Extending the base of OpenMP will encourage more researchers to take advantage of attached devices, and to develop applications that support multiple architectures.”
  • Mentor Graphics has developed OpenACC extensions that will be supported in mainstream GCC compilers.
  • AWS Launches ‘Ivy Bridge’-backed EC2 Instance Type
  • NVIDIA Announces CUDA 6 • Unified Memory – Simplifies programming by enabling applications to access CPU and GPU memory without the need to manually copy data from one to the other, and makes it easier to add support for GPU acceleration in a wide range of programming languages. • Drop-in Libraries – Automatically accelerates applications’ BLAS and FFTW calculations by up to 8X by simply replacing the existing CPU libraries with the GPU-accelerated equivalents. • Multi-GPU Scaling – Re-designed BLAS and FFT GPU libraries automatically scale performance across up to eight GPUs in a single node, delivering over nine teraflops of double precision performance per node, and supporting larger workloads than ever before (up to 512GB). Multi-GPU scaling can also be used with the new BLAS drop-in library.
  • Nvidia Unleashes Tesla K4 The Tesla K40 GPU accelerator has double the memory of the Tesla K20X, until now Nvidia's top GPU accelerator, and delivers a 40 percent performance boost over its predecessor. The Tesla K40 is based on Nvidia's Kepler graphics processing architecture and sports 2,880 GPU cores supporting the graphics chip maker's CUDA parallel programming language. The most powerful graphics platform Nvidia has built to date has a whopping 12GB of GDDR5 memory, supports the PCIe 3.0 interconnect
  • 1.21 petaFLOPS(RPeak), 156,000-core CycleCloud HPC runs 264 years of Materials Science
  • SDSC Uses Meteor Raspberry Pi Cluster to Teach Parallel Computing
  • Xeon Phi
  • htop
  • zSpace Immersive, 3D Display Technology https://www.youtube.com/watch?v=pw_n58fUu-c zspace.com
  • Dark Silicon A LANDSCAPE OF THE NEW DARK SILICON DESIGN REGIME, Michael B. Taylor. University of California, San Diego
  • Petaflops of Xeon Phi in a Rack by RSC
  • A few pictures…