This document discusses high-performance computing and numerical algorithms for large-scale scientific simulations. It describes how modern HPC architectures have increasing numbers of cores but declining memory per core, posing challenges for numerical algorithms. High-order numerical methods that efficiently use computing power while maintaining solution accuracy are needed. The document also provides an overview of the FLASH astrophysics simulation code, its capabilities, and examples of simulations performed.
The document provides an overview of a lecture on artificial neural networks and the multilayer perceptron model. It discusses the basics of ANNs, including their ability to learn from examples without being explicitly programmed. It then describes the multilayer perceptron model and the backpropagation algorithm for training neural networks in a supervised manner by minimizing error. Key aspects covered include the network architecture of perceptrons with multiple layers and nonlinear activation functions, as well as the backward propagation of errors to update weights to reduce error.
This document discusses high-order numerical methods for predictive science on large-scale high-performance computing architectures. It covers three main topics: 1) High performance computing and how modern architectures have increasing numbers of cores but declining memory per core, requiring a shift in numerical algorithms. 2) Ideas on high-order numerical methods that are more accurate using less grid points and higher-order approximations. 3) The importance of validating and verifying simulations against theoretical solutions and experiments for predictive science.
AMS 250 - High-Performance, Massively Parallel Computing with FLASH dongwook159
Dongwook Lee presented on optimizing the FLASH astrophysics simulation code for massively parallel systems. FLASH uses adaptive mesh refinement and solves hydrodynamics and MHD equations. It scales well to over 100,000 processors. Lee discussed adding OpenMP directives to take advantage of multi-threading on IBM's Blue Gene/Q architecture. Two threading strategies - assigning blocks or cells to threads - were tested, with finer-grained assignments performing better. Further optimizations like array reordering reduced runtime. The talk showed how large simulation codes can be adapted to emerging supercomputing architectures.
1. Space, Time, Power: Evolving Concerns for Parallel Algorithms February 2008
2. Real and Abstract Parallel Systems • Space: where are the processors located? • Time: how does location affect the time of algorithms? • Power: what happens when power is a constraint?
3. Some Real Systems: IBM BlueGene/L 212,992 CPUs 478 Tflops #1 supercomputer since 11/04 At Lawrence Livermore Nat’l Lab ≈ $200 Million 3-d toroidal interconnect Max distance (# proc)1/3
4. Another Real System: ZebraNet PI M M a r t o n o s i
5. Location, Location, Location • Processors may only be able to communicate with nearby processors • or, time to communicate is a function of distance • or, many processors trying to communicate to ones far away can create communication bottleneck • Feasible, efficient programs need to take location into account
6. What if Space is actually Computers? Cellular Automata • Finite automata, next state depends on current state and neighbors’ states: location matters! • ≈ 1950 von Neumann used as a model of parallelism and interaction in space • Other research: Burks & al. at UM, Conway, Wolfram,… • Can model leaf growth, traffic flow, etc.
7. Parallel Algorithms: Time Maze of black/white pixels, one per processor in CA. Can I get out? Nature-like propagation algorithm: time linear in area Beyer, Levialdi ≈ 1970: time linear in edgelength. CA as parallel computer, not just nature simulator
This document provides information about a Logic Design II course taught by Dr. Ihab Talkhan. The course covers advanced digital circuit design topics including sequential circuits, finite state machines, and programmable logic devices. It consists of 2-hour lectures per week. Assignments will include problems from specified textbooks. Grading will be based on a midterm exam, assignments, and a final exam. The course aims to teach students about digital logic design principles and the use of CAD tools for implementation.
Materials Modelling: From theory to solar cells (Lecture 1)cdtpv
This document provides an overview of a mini-module on materials modelling for solar energy applications. It introduces the lecturers and outlines the course structure, which includes lectures on modelling, interfaces, and multi-scale approaches. It also describes a literature review activity where students will present a research paper using materials modelling in photovoltaics. Recommended textbooks are provided on topics like bonding in solids, computational chemistry, and density functional theory for solids.
Virus, Vaccines, Genes and Quantum - 2020-06-18Aritra Sarkar
This document discusses using a quantum computer to simulate DNA-based vaccines by indexing and aligning short DNA reads to a reference genome. It describes superimposing the reference genome segmented into short reads and evolving via controlled operations to the Hamming distance against the short read. The maximum probability entry indicates the alignment index. Steps include 1) superposing the indexed reference segments, 2) evolving via controlled operations to the Hamming distance, and 3) finding the maximum probability entry indicating the alignment index.
The document provides an introduction to running the Siesta software package for performing density functional theory (DFT) calculations. It describes the basic input variables needed, including system descriptors, structural parameters, functional and basis set specifications. It also outlines how to run Siesta from the command line and analyze outputs such as electronic band structures, densities of states, and charge densities. Post-processing tools are also summarized.
The document provides an overview of a lecture on artificial neural networks and the multilayer perceptron model. It discusses the basics of ANNs, including their ability to learn from examples without being explicitly programmed. It then describes the multilayer perceptron model and the backpropagation algorithm for training neural networks in a supervised manner by minimizing error. Key aspects covered include the network architecture of perceptrons with multiple layers and nonlinear activation functions, as well as the backward propagation of errors to update weights to reduce error.
This document discusses high-order numerical methods for predictive science on large-scale high-performance computing architectures. It covers three main topics: 1) High performance computing and how modern architectures have increasing numbers of cores but declining memory per core, requiring a shift in numerical algorithms. 2) Ideas on high-order numerical methods that are more accurate using less grid points and higher-order approximations. 3) The importance of validating and verifying simulations against theoretical solutions and experiments for predictive science.
AMS 250 - High-Performance, Massively Parallel Computing with FLASH dongwook159
Dongwook Lee presented on optimizing the FLASH astrophysics simulation code for massively parallel systems. FLASH uses adaptive mesh refinement and solves hydrodynamics and MHD equations. It scales well to over 100,000 processors. Lee discussed adding OpenMP directives to take advantage of multi-threading on IBM's Blue Gene/Q architecture. Two threading strategies - assigning blocks or cells to threads - were tested, with finer-grained assignments performing better. Further optimizations like array reordering reduced runtime. The talk showed how large simulation codes can be adapted to emerging supercomputing architectures.
1. Space, Time, Power: Evolving Concerns for Parallel Algorithms February 2008
2. Real and Abstract Parallel Systems • Space: where are the processors located? • Time: how does location affect the time of algorithms? • Power: what happens when power is a constraint?
3. Some Real Systems: IBM BlueGene/L 212,992 CPUs 478 Tflops #1 supercomputer since 11/04 At Lawrence Livermore Nat’l Lab ≈ $200 Million 3-d toroidal interconnect Max distance (# proc)1/3
4. Another Real System: ZebraNet PI M M a r t o n o s i
5. Location, Location, Location • Processors may only be able to communicate with nearby processors • or, time to communicate is a function of distance • or, many processors trying to communicate to ones far away can create communication bottleneck • Feasible, efficient programs need to take location into account
6. What if Space is actually Computers? Cellular Automata • Finite automata, next state depends on current state and neighbors’ states: location matters! • ≈ 1950 von Neumann used as a model of parallelism and interaction in space • Other research: Burks & al. at UM, Conway, Wolfram,… • Can model leaf growth, traffic flow, etc.
7. Parallel Algorithms: Time Maze of black/white pixels, one per processor in CA. Can I get out? Nature-like propagation algorithm: time linear in area Beyer, Levialdi ≈ 1970: time linear in edgelength. CA as parallel computer, not just nature simulator
This document provides information about a Logic Design II course taught by Dr. Ihab Talkhan. The course covers advanced digital circuit design topics including sequential circuits, finite state machines, and programmable logic devices. It consists of 2-hour lectures per week. Assignments will include problems from specified textbooks. Grading will be based on a midterm exam, assignments, and a final exam. The course aims to teach students about digital logic design principles and the use of CAD tools for implementation.
Materials Modelling: From theory to solar cells (Lecture 1)cdtpv
This document provides an overview of a mini-module on materials modelling for solar energy applications. It introduces the lecturers and outlines the course structure, which includes lectures on modelling, interfaces, and multi-scale approaches. It also describes a literature review activity where students will present a research paper using materials modelling in photovoltaics. Recommended textbooks are provided on topics like bonding in solids, computational chemistry, and density functional theory for solids.
Virus, Vaccines, Genes and Quantum - 2020-06-18Aritra Sarkar
This document discusses using a quantum computer to simulate DNA-based vaccines by indexing and aligning short DNA reads to a reference genome. It describes superimposing the reference genome segmented into short reads and evolving via controlled operations to the Hamming distance against the short read. The maximum probability entry indicates the alignment index. Steps include 1) superposing the indexed reference segments, 2) evolving via controlled operations to the Hamming distance, and 3) finding the maximum probability entry indicating the alignment index.
The document provides an introduction to running the Siesta software package for performing density functional theory (DFT) calculations. It describes the basic input variables needed, including system descriptors, structural parameters, functional and basis set specifications. It also outlines how to run Siesta from the command line and analyze outputs such as electronic band structures, densities of states, and charge densities. Post-processing tools are also summarized.
The document describes the Pochoir stencil compiler, which allows programmers to write specifications for stencil computations in a domain-specific language embedded in C++. The Pochoir compiler then translates these specifications into high-performance parallel Cilk code using an efficient cache-oblivious algorithm called TRAP. Benchmark results show that the Pochoir-generated code runs 2-10 times faster than standard parallel loop implementations for a variety of stencil computations.
In this deck from the DOE CSGF Program Review meeting, Nicholas Frontiere from the University of Chicago presents: HACC - Fitting the Universe Inside a Supercomputer.
"In response to the plethora of data from current and future large-scale structure surveys of the universe, sophisticated simulations are required to obtain commensurate theoretical predictions. We have developed the Hardware/Hybrid Accelerated Cosmology Code (HACC), capable of sustained performance on powerful and architecturally diverse supercomputers to address this numerical challenge. We will investigate the numerical methods utilized to solve a problem that evolves trillions of particles, with a dynamic range of a million to one."
Watch the video: https://wp.me/p3RLHQ-i4l
Learn more: https://www.krellinst.org/csgf/conf/2017/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Advances in the Solution of NS Eqs. in GPGPU Hardware. Second order scheme an...Storti Mario
The document discusses advances in solving the Navier-Stokes equations on GPU hardware. It describes how GPUs have many cores designed for parallel computation. Solving PDEs like the Navier-Stokes equations on GPUs requires algorithms that can take advantage of parallelism, such as finite difference methods on structured grids. FFT solvers for the Poisson equation achieve fast O(N log N) performance on GPUs. Methods like IOP and AGP can solve Poisson problems with embedded geometries by iteratively projecting solutions to satisfy divergence and boundary conditions.
This is slide set of my Octopus-ReEL (Realtime Encephalography Lab) presentation in GDG-Izmir event held on Nov 3rd 2018 at Ege University Computer Engineering Dept.
1. The document discusses barriers to scaling electronic structure methods to large systems, such as the inability of sparse matrix multiplication kernels to access strong parallel scaling and entrenched data structures that limit innovation.
2. It proposes a fast, generic, and data local N-body solver approach using new mathematics that is not constrained by row-column data structures and allows a single programming model.
3. Key aspects of this approach include exploiting locality in higher dimensional product volumes through techniques like occlusion-culling, resolving identity iteratively to compress matrices by orders of magnitude, and developing optimized sparse matrix multiplication kernels.
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...Storti Mario
In this article we compare the results obtained with an implementation of the Finite Volume for structured meshes on GPGPUs with experimental results and also with a Finite Element code with boundary fitted strategy. The example is a fully submerged spherical buoy immersed in a cubic water recipient. The recipient undergoes an harmonic linear motion imposed with a shake table. The experiment is recorded with a high speed camera and the displacement of the buoy if obtained from the video with a MoCap (Motion Capture) algorithm. The amplitude and phase of the resulting motion allows to determine indirectly the added mass and drag of the sphere.
Jeff Johnson, Research Engineer, Facebook at MLconf NYCMLconf
Hacking GPUs for Deep Learning: GPUs have revolutionized machine learning in recent years, and have made both massive and deep multi-layer neural networks feasible. However, misunderstandings on why they seem to be winning persist. Many of deep learning’s workloads are in fact “too small” for GPUs, and require significantly different approaches to take full advantage of their power. There are many differences between traditional high-performance computing workloads, long the domain of GPUs, and those used in deep learning. This talk will cover these issues by looking into various quirks of GPUs, how they are exploited (or not) in current model architectures, and how Facebook AI Research is approaching deep learning programming through our recent work.
Current Research on Quantum Algorithms.pptDefiantTones
This document summarizes research on quantum algorithms being conducted at the Institute for Quantum Information and Matter (IQIM). The research objectives include developing improved methods for fault-tolerant quantum computation, new quantum algorithms beyond the hidden subgroup problem, and simulation methods for quantum many-body systems and local quantum systems. Recent progress includes developing quantum algorithms for simulating particle collisions in fermionic quantum field theories and optimal algorithms for preparing topological quantum error correcting codes. Future work will focus on showing problems in quantum field theory are hard, characterizing logical operations in topological codes, and improving bounds on quantum memory times.
Towards Exascale Simulations of Stellar Explosions with FLASHGanesan Narayanasamy
- ORNL is managed by UT-Battelle for the US Department of Energy and conducts research including simulations of stellar explosions using the FLASH code.
- The research aims to prepare FLASH to run on the upcoming Summit supercomputer by accelerating components like the nuclear kinetics module using GPUs.
- Preliminary results show significant speedups from using GPUs for large nuclear reaction networks that were previously too computationally expensive.
FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote) Wim Vanderbauwhede
This document provides a historical overview of the evolution of FPGA technology and programming approaches over several decades. It discusses early theoretical foundations in the 1930s-40s and the development of integrated circuits, hardware description languages, and high-level synthesis tools from the 1950s onwards. More recently, it describes the rise of heterogeneous computing using GPUs, FPGAs and other accelerators, and the ongoing challenges around programming such systems at a suitable level of abstraction.
A few fundamental concepts in digital electronicsJoy Prabhakaran
A simple and fun exploration of the simple conceptual building blocks that form the bed rock of electronics. The focus is almost totally on digital electronics.
Java Thread and Process Performance for Parallel Machine Learning on Multicor...Saliya Ekanayake
The growing use of Big Data frameworks on large machines highlights the importance of performance issues and the value of High Performance Computing (HPC) technology. This paper looks carefully at three major frameworks Spark, Flink and Message Passing Interface (MPI) both in scaling across nodes and internally over the many cores inside modern nodes.We focus on the special challenges of the Java Virtual Machine (JVM) using an Intel Haswell HPC cluster with 24 cores per node. Two parallel machine learning algorithms, K-Means clustering and Multidimensional Scaling (MDS) are used in our performance studies. We identify three major issues – thread models, affinity patterns, and communication mechanisms – as factors affecting performance by large factors and show how to optimize them so that Java can match the performance of traditional HPC languages like C. Further we suggest approaches that preserve the user interface and elegant dataflow approach of Flink and Spark but modify the runtime so that these Big Data frameworks can achieve excellent performance and realize the goals of HPCBig Data convergence.
Density functional theory (DFT) uses the Hohenberg-Kohn-Sham theory to map the many-body Schrödinger equation onto a single-body problem. VASP is a DFT software package that uses plane wave basis sets, pseudopotentials to approximate core electrons, and periodic boundary conditions to model materials with up to 200 atoms. VASP input files include INCAR for calculation parameters, POSCAR for geometry, POTCAR for pseudopotentials, and KPOINTS for k-point meshes.
Density functional theory (DFT) uses the Hohenberg-Kohn-Sham theory to map the many-body Schrödinger equation onto a single-body problem. VASP is a DFT software package that uses plane wave basis sets, pseudopotentials to approximate core electrons, and periodic boundary conditions to model materials with up to 200 atoms. VASP input files include INCAR for calculation parameters, POSCAR for geometry, POTCAR for pseudopotentials, and KPOINTS for k-point meshes.
This document summarizes modeling efforts of stellar explosions using astrophysics simulation codes like Maestro, Castro, and BoxLib. It discusses the multiscale challenges of modeling convection, turbulence, nuclear burning and rotation in stars. It also summarizes the temporal challenges of modeling processes on timescales from millions to seconds. The document outlines the types of approximations commonly used and the diversity of codes in the field. It provides details on the hydrodynamic schemes, grids, and divergence constraints used in different codes. Finally, it summarizes some results on modeling type Ia supernovae, helium convection in sub-Chandra white dwarfs, and white dwarf mergers.
The document discusses quantum computing and its potential applications and implications. It notes that quantum computers could provide exponential speedups for some computations like factoring, breaking current cryptography. However, they would provide only quadratic speedups for many optimization problems. While speeding up simulations of quantum systems, there are no speedups for many other problems. Quantum information is also useful for applications beyond computation like cryptography, communication, and metrology. Overall, quantum computing is an exciting area of basic science but practical quantum computers able to outperform classical ones may still be decades away.
A shared-filesystem-memory approach for running IDA in parallel over informal...openseesdays
This document describes a method for running incremental dynamic analysis (IDA) in parallel over computer clusters to reduce computation time. The method distributes IDA tasks across multiple CPUs by either: (1) distributing individual seismic records to different CPUs or (2) further distributing the runs within each record to additional CPUs. This achieves near linear speedup. The method is applied to a case study building to demonstrate a reduction in analysis time from 40 hours to less than 10 hours using 20 CPUs. Monte Carlo simulations are also discussed to quantify modeling parameter uncertainties through approximate IDA techniques.
The document discusses the challenges of preserving data from high-energy physics (HEP) experiments. It notes that HEP experiments produce huge amounts of data that require extensive storage, software, and computing resources to analyze. However, many past HEP facilities did not have long-term strategies for archiving and preserving their data. Developing methods for permanent preservation, reuse, and open access to HEP data presents significant technical and financial challenges but is important given the huge costs of generating the data.
Benchmark Calculations of Atomic Data for Modelling ApplicationsAstroAtom
This document summarizes benchmark calculations of atomic data for modeling applications. It discusses numerical methods like close-coupling and distorted-wave approaches for calculating atomic collision data. It provides selected results on energy levels, oscillator strengths, and electron-impact excitation cross sections. It also discusses applications to modeling neon discharges and takes a closer look at ionization calculations and examples. The document concludes by discussing the production and assessment of atomic data and outlines challenges in obtaining reliable data from both experiments and calculations.
The document describes the Pochoir stencil compiler, which allows programmers to write specifications for stencil computations in a domain-specific language embedded in C++. The Pochoir compiler then translates these specifications into high-performance parallel Cilk code using an efficient cache-oblivious algorithm called TRAP. Benchmark results show that the Pochoir-generated code runs 2-10 times faster than standard parallel loop implementations for a variety of stencil computations.
In this deck from the DOE CSGF Program Review meeting, Nicholas Frontiere from the University of Chicago presents: HACC - Fitting the Universe Inside a Supercomputer.
"In response to the plethora of data from current and future large-scale structure surveys of the universe, sophisticated simulations are required to obtain commensurate theoretical predictions. We have developed the Hardware/Hybrid Accelerated Cosmology Code (HACC), capable of sustained performance on powerful and architecturally diverse supercomputers to address this numerical challenge. We will investigate the numerical methods utilized to solve a problem that evolves trillions of particles, with a dynamic range of a million to one."
Watch the video: https://wp.me/p3RLHQ-i4l
Learn more: https://www.krellinst.org/csgf/conf/2017/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Advances in the Solution of NS Eqs. in GPGPU Hardware. Second order scheme an...Storti Mario
The document discusses advances in solving the Navier-Stokes equations on GPU hardware. It describes how GPUs have many cores designed for parallel computation. Solving PDEs like the Navier-Stokes equations on GPUs requires algorithms that can take advantage of parallelism, such as finite difference methods on structured grids. FFT solvers for the Poisson equation achieve fast O(N log N) performance on GPUs. Methods like IOP and AGP can solve Poisson problems with embedded geometries by iteratively projecting solutions to satisfy divergence and boundary conditions.
This is slide set of my Octopus-ReEL (Realtime Encephalography Lab) presentation in GDG-Izmir event held on Nov 3rd 2018 at Ege University Computer Engineering Dept.
1. The document discusses barriers to scaling electronic structure methods to large systems, such as the inability of sparse matrix multiplication kernels to access strong parallel scaling and entrenched data structures that limit innovation.
2. It proposes a fast, generic, and data local N-body solver approach using new mathematics that is not constrained by row-column data structures and allows a single programming model.
3. Key aspects of this approach include exploiting locality in higher dimensional product volumes through techniques like occlusion-culling, resolving identity iteratively to compress matrices by orders of magnitude, and developing optimized sparse matrix multiplication kernels.
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...Storti Mario
In this article we compare the results obtained with an implementation of the Finite Volume for structured meshes on GPGPUs with experimental results and also with a Finite Element code with boundary fitted strategy. The example is a fully submerged spherical buoy immersed in a cubic water recipient. The recipient undergoes an harmonic linear motion imposed with a shake table. The experiment is recorded with a high speed camera and the displacement of the buoy if obtained from the video with a MoCap (Motion Capture) algorithm. The amplitude and phase of the resulting motion allows to determine indirectly the added mass and drag of the sphere.
Jeff Johnson, Research Engineer, Facebook at MLconf NYCMLconf
Hacking GPUs for Deep Learning: GPUs have revolutionized machine learning in recent years, and have made both massive and deep multi-layer neural networks feasible. However, misunderstandings on why they seem to be winning persist. Many of deep learning’s workloads are in fact “too small” for GPUs, and require significantly different approaches to take full advantage of their power. There are many differences between traditional high-performance computing workloads, long the domain of GPUs, and those used in deep learning. This talk will cover these issues by looking into various quirks of GPUs, how they are exploited (or not) in current model architectures, and how Facebook AI Research is approaching deep learning programming through our recent work.
Current Research on Quantum Algorithms.pptDefiantTones
This document summarizes research on quantum algorithms being conducted at the Institute for Quantum Information and Matter (IQIM). The research objectives include developing improved methods for fault-tolerant quantum computation, new quantum algorithms beyond the hidden subgroup problem, and simulation methods for quantum many-body systems and local quantum systems. Recent progress includes developing quantum algorithms for simulating particle collisions in fermionic quantum field theories and optimal algorithms for preparing topological quantum error correcting codes. Future work will focus on showing problems in quantum field theory are hard, characterizing logical operations in topological codes, and improving bounds on quantum memory times.
Towards Exascale Simulations of Stellar Explosions with FLASHGanesan Narayanasamy
- ORNL is managed by UT-Battelle for the US Department of Energy and conducts research including simulations of stellar explosions using the FLASH code.
- The research aims to prepare FLASH to run on the upcoming Summit supercomputer by accelerating components like the nuclear kinetics module using GPUs.
- Preliminary results show significant speedups from using GPUs for large nuclear reaction networks that were previously too computationally expensive.
FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote) Wim Vanderbauwhede
This document provides a historical overview of the evolution of FPGA technology and programming approaches over several decades. It discusses early theoretical foundations in the 1930s-40s and the development of integrated circuits, hardware description languages, and high-level synthesis tools from the 1950s onwards. More recently, it describes the rise of heterogeneous computing using GPUs, FPGAs and other accelerators, and the ongoing challenges around programming such systems at a suitable level of abstraction.
A few fundamental concepts in digital electronicsJoy Prabhakaran
A simple and fun exploration of the simple conceptual building blocks that form the bed rock of electronics. The focus is almost totally on digital electronics.
Java Thread and Process Performance for Parallel Machine Learning on Multicor...Saliya Ekanayake
The growing use of Big Data frameworks on large machines highlights the importance of performance issues and the value of High Performance Computing (HPC) technology. This paper looks carefully at three major frameworks Spark, Flink and Message Passing Interface (MPI) both in scaling across nodes and internally over the many cores inside modern nodes.We focus on the special challenges of the Java Virtual Machine (JVM) using an Intel Haswell HPC cluster with 24 cores per node. Two parallel machine learning algorithms, K-Means clustering and Multidimensional Scaling (MDS) are used in our performance studies. We identify three major issues – thread models, affinity patterns, and communication mechanisms – as factors affecting performance by large factors and show how to optimize them so that Java can match the performance of traditional HPC languages like C. Further we suggest approaches that preserve the user interface and elegant dataflow approach of Flink and Spark but modify the runtime so that these Big Data frameworks can achieve excellent performance and realize the goals of HPCBig Data convergence.
Density functional theory (DFT) uses the Hohenberg-Kohn-Sham theory to map the many-body Schrödinger equation onto a single-body problem. VASP is a DFT software package that uses plane wave basis sets, pseudopotentials to approximate core electrons, and periodic boundary conditions to model materials with up to 200 atoms. VASP input files include INCAR for calculation parameters, POSCAR for geometry, POTCAR for pseudopotentials, and KPOINTS for k-point meshes.
Density functional theory (DFT) uses the Hohenberg-Kohn-Sham theory to map the many-body Schrödinger equation onto a single-body problem. VASP is a DFT software package that uses plane wave basis sets, pseudopotentials to approximate core electrons, and periodic boundary conditions to model materials with up to 200 atoms. VASP input files include INCAR for calculation parameters, POSCAR for geometry, POTCAR for pseudopotentials, and KPOINTS for k-point meshes.
This document summarizes modeling efforts of stellar explosions using astrophysics simulation codes like Maestro, Castro, and BoxLib. It discusses the multiscale challenges of modeling convection, turbulence, nuclear burning and rotation in stars. It also summarizes the temporal challenges of modeling processes on timescales from millions to seconds. The document outlines the types of approximations commonly used and the diversity of codes in the field. It provides details on the hydrodynamic schemes, grids, and divergence constraints used in different codes. Finally, it summarizes some results on modeling type Ia supernovae, helium convection in sub-Chandra white dwarfs, and white dwarf mergers.
The document discusses quantum computing and its potential applications and implications. It notes that quantum computers could provide exponential speedups for some computations like factoring, breaking current cryptography. However, they would provide only quadratic speedups for many optimization problems. While speeding up simulations of quantum systems, there are no speedups for many other problems. Quantum information is also useful for applications beyond computation like cryptography, communication, and metrology. Overall, quantum computing is an exciting area of basic science but practical quantum computers able to outperform classical ones may still be decades away.
A shared-filesystem-memory approach for running IDA in parallel over informal...openseesdays
This document describes a method for running incremental dynamic analysis (IDA) in parallel over computer clusters to reduce computation time. The method distributes IDA tasks across multiple CPUs by either: (1) distributing individual seismic records to different CPUs or (2) further distributing the runs within each record to additional CPUs. This achieves near linear speedup. The method is applied to a case study building to demonstrate a reduction in analysis time from 40 hours to less than 10 hours using 20 CPUs. Monte Carlo simulations are also discussed to quantify modeling parameter uncertainties through approximate IDA techniques.
The document discusses the challenges of preserving data from high-energy physics (HEP) experiments. It notes that HEP experiments produce huge amounts of data that require extensive storage, software, and computing resources to analyze. However, many past HEP facilities did not have long-term strategies for archiving and preserving their data. Developing methods for permanent preservation, reuse, and open access to HEP data presents significant technical and financial challenges but is important given the huge costs of generating the data.
Benchmark Calculations of Atomic Data for Modelling ApplicationsAstroAtom
This document summarizes benchmark calculations of atomic data for modeling applications. It discusses numerical methods like close-coupling and distorted-wave approaches for calculating atomic collision data. It provides selected results on energy levels, oscillator strengths, and electron-impact excitation cross sections. It also discusses applications to modeling neon discharges and takes a closer look at ionization calculations and examples. The document concludes by discussing the production and assessment of atomic data and outlines challenges in obtaining reliable data from both experiments and calculations.
Similar to Dongwook's talk on High-Performace Computing (20)
Gender and Mental Health - Counselling and Family Therapy Applications and In...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
Communicating effectively and consistently with students can help them feel at ease during their learning experience and provide the instructor with a communication trail to track the course's progress. This workshop will take you through constructing an engaging course container to facilitate effective communication.
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
1. Dongwook Lee
Flash Center at the University of Chicago
Overcoming Challenges in High Performance
Computing: High-order Numerical Methods for
Large-Scale Scientific Computing and Plasma Simulations
Department of Applied Mathematics and Statistics
University of California, Santa Cruz
February 24, 2014
FLASH Simulation of a 3D Core-collapse Supernova
Courtesy of S. Couch
MIRA, BG/Q, Argonne National Lab
49,152 nodes, 786,432 cores
3. High Performance Computing (HPC)
‣To solve large problems in science, engineering, or
business
‣Modern HPC architectures have
▪ increasing number of cores
▪ declining memory/core
‣This trend will continue for the foreseeable future
4. High Performance Computing (HPC)
‣This tension between computation & memory brings a
paradigm shift in numerical algorithms for HPC
‣To enable scientific computing on HPC architectures:
▪ efficient parallel computing, (e.g., data parallelism, task
parallelism, MPI, multi-threading, GPU accelerator, etc.)
▪ better numerical algorithms for HPC
5. Numerical Algorithms for HPC
‣Numerical algorithms should conform to the
abundance of computing power and the scarcity of
memory
‣But…
▪ without losing solution accuracy
▪ maintaining maximum solution stability
▪ faster convergence to “correct” solution
6. Large Scale Astrophysics Codes
▪FLASH (Flash group, U of Chicago)
▪ PLUTO (Mignone, U of Torino),
▪ CHOMBO (Colella, LBL)
▪ CASTRO (Almgren,Woosley, LBL, UCSC)
▪ MAESTRO (Zingale, Almgren, SUNY, LBL)
▪ ENZO (Bryan, Norman, Abel, Enzo group)
▪ BATS-R-US (CSEM, U of Michigan)
▪ RAMSES (Teyssier, CEA)
▪ CHARM (Miniati, ETH)
▪ AMRVAC (Toth, Keppens, CPA, K.U.Leuven)
▪ ATHENA (Stone, Princeton)
▪ ORION (Klein, McKee, U of Berkeley)
▪ ASTROBear (Frank, U of Rochester)
▪ ART (Kravtsov, Klypin, U of Chicago)
▪ NIRVANA (Ziegler, Leibniz-Institut für Astrophysik Potsdam), and others
Peta- Scale
Current HPC
Future HPC
Giga- Scale
Current Laptop/Desktop
?
7. The FLASH Code
‣FLASH is is free, open source code for astrophysics and HEDP
!
▪ modular, multi-physics, adaptive mesh refinement (AMR), parallel
(MPI & OpenMP), finite-volume Eulerian compressible code for
solving hydrodynamics and MHD
!
▪ professionally software engineered and maintained (daily
regression test suite, code verification/validation), inline/online
documentation
!
▪ 8500 downloads, 1500 authors, 1000 papers
!
▪ FLASH can run on various platforms from laptops to
supercomputing (peta-scale) systems such as IBM BG/P and BG/Q
8. FLASH Simulations
cosmological
cluster formation supersonic MHD
turbulence
Type Ia SN
RT
CCSN
ram pressure stripping
laser slab
rigid body
structure
Accretion Torus
LULI/Vulcan experiments: B-field
generation/amplification
9. High Energy Density Physics
!
▪ multi-temperature (1T, 2T, & 3T) in
hydrodynamics and MHD
▪ implicit electron thermal conduction
using HYPRE
▪ flux-limited multi-group approximation
for diffusion radiative transfer
▪ multi-material support: EoS and
opacity (tabular & analytic)
▪ laser energy deposition using ray
tracing
▪ rigid body structures
FLASH’s Multi-Physics Capabilities
Astrophysics
!
▪ hydrodynamics, MHD, RHD,
cosmology, hybrid PIC
▪ EoS:
gamma laws, multi-gamma,
Helmholtz
▪ nuclear physics and other source
terms
▪ external gravity, self-gravity
▪ active and passive particles
(used for PIC, laser ray tracing,
dark matter, tracer particles)
▪ material properties
✓Fortran, C, Python > 1.2 million lines (25% comments!)
✓Extensive documentations available in User’s Guide
✓Scalable to tens of thousand processors with AMR
11. My Collaborators
▪ M. Ruszkowski (U of Michigan)
▪ J. ZuHone (NASA, Goddard)
▪ K. Murawski (UMSC, Poland)
▪ F. Cattaneo (U of Chicago)
▪ P. Ricker (UIUC)
▪ M. Bruggen, R. Banerjee (U of Hamburg)
▪ M. Shin (U of Oxford)
▪ P. Oh, S. Ji (UCSB)
▪ I. Parrish (UCB)
▪ E. Zweibel (UWM)
▪ A. Deane (UMD)
▪ A. Dubey, P. Colella, J. Bachan, C. Daley (LBL)
▪ C. Federrath (Monash U,Australia)
▪ R. Fisher (UM Darthmouth)
▪ G. Gianluca, J. Meinecke (U of Oxford)
▪ P. Drake (U of Michigan)
▪ R.Yurchak (LULI, France)
▪ F. Miniati (ETH, Switzerland)
Astrophysics
High Energy
Density Physics
12. Parallelization, Optimization & Speedup
‣Adaptive Mesh Refinement with Paramesh
▪standard MPI (Message Passing Interface)
parallelism
▪domain decomposition distributed over
multiple processor units
▪distributed memory
uniform grid octree-based
block AMR
patch-based AMR
Single block
14. ▪ 5 leaf blocks in a single MPI rank
▪ 2 threads/core (or 2 threads/rank)
Parallelization, Optimization & Speedup
thread block list thread within block
‣Multi-threading (shared memory) using OpenMP directives
▪more parallel computing on BG/Q using hardware threads on a core
▪ 16 cores/node, 4 threads/core
18. FV Godunov Scheme for Hyperbolic System
‣The system of conservation laws
(hyperbolic PDE) in 1D:
!
!
!
!
‣A discrete integral form (i.e., finite-
volume):
!
!
!
‣Godunov scheme seeks for time averaged
fluxes at each interface by solving the self-
similar solution of the Riemann problem:
!
!
@U
@t
+
@F
@x
= 0
Un+1
i = Un
i
t
x
(Fn
i+1/2 Fn
i 1/2)
U⇤
i+1/2(x/t) = RP(Un
i , Un
i+1),
Fn
i+1/2 = F(U⇤
i+1/2(0))
= F(Un
i , Un
i+1)
t
x
U⇤
(x/t)
Un
i+1Un
i
Riemann Fan
rarefaction
contact
discontinuity
shock
piecewise-constant (first-order)
20. A Discrete World of FV
u(xi, tn
) = Pi(x), x 2 (xi 1/2, xi 1/2)
xixi 1 xi+1
piecewise polynomial reconstruction
on each cell
uL = Pi+1(xi+1/2)uR = Pi(xi+1/2)
21. A Discrete World of FV
xixi 1 xi+1
At each interface we solve a RP and obtain Fi+1/2
22. A Discrete World of FV
We are ready to advance our solution in time and
get new volume-averaged states
Un+1
i = Un
i
t
x
(Fi+1/2 Fi 1/2)
23. It Gets Much More Complicated in Reality
▪In 3D we have 6 interfaces per cell
▪2 transverse RPs per each interface
▪12 RPs are needed to for maximum
Courant stability limit, Ca~1
▪Expensive!
x
y
z
24. Computational Advantages In Unsplit Solvers
‣New Efficient Unsplit Algorithm (Lee & Deane, 2009; Lee, 2013)
‣Most unsplit schemes need 12 Riemann solves (see Table)
‣3D Unsplit solvers in FLASH need
▪ 3 Riemann solves in hydro & 6 Riemann solves in MHD with
maximum Courant stability limit, Ca ~ 1
Mignone et al. 2007
25. Stability, Consistency and Convergence
‣Lax Equivalence Theorem (for linear problem, P. Lax, 1956)
▪The only convergent schemes are those that are both consistent and stable
▪Hard to show that the numerical solution converges to the original
solution of PDE; relatively easy to show consistency and stability of
numerical schemes
‣In practice, non-linear problems adopts the linear theory as guidance
▪code verification (code-to-code comparison)
▪code validation (code-to-experiment, code-to-analytical solution
comparisons)
▪self-convergence test over grid resolutions (a good measurement for
numerical accuracy)
26. PLM PPM
High-Order Polynomial Reconstruction
• Godunov’s order-barrier theorem (1959)
• Monotonicity-preserving advection schemes are at most first-order! (Oh no…)
• Only true for linear PDE theory (YES!)
!
• High-order “polynomial” schemes became available using non-linear slope limiters
(70’s and 80’s: Boris, van Leer, Zalesak, Colella, Harten, Shu, Engquist, etc)
• Can’t avoid oscillations completely (non-TVD)
• Instability grows (numerical INSTABILITY!)
FOG
28. Low-Order vs. High-Order
1st Order
High-Order
Ref. Soln
1st order: 3200 cells (50 MB), 160 sec, 3828 steps
vs
High-order: 200 cells (10 MB), 9 sec, 266 steps
29. Circularly Polarized Alfven Wave (CPAW)
▪A CPAW problem propagates
smoothly varying oscillations of
the transverse components of
velocity and magnetic field
▪The initial condition is the exact
nonlinear solutions of the MHD
equations
▪The decay of the max ofVz and
Bz is solely due to numerical
dissipation: direct measurement
of numerical diffusion (Ryu, Jones
& Frank,ApJ, 1995)
Fig. A.3. Long term decay of circularly polarized Alfvén waves after 16.5 time units, corresponding to $ 100 wave periods. In the left panel, we plot the
A. Mignone et al. / Journal of Computational Physics 229 (2010) 5896–5920 5907
30. Outperformance of High-Order: CPAW
L1 norm error
avg. comp.
time / step
32 256
0.221 (x5/3)sec 38.4 sec
Source: Mignone & Tzeferacos, 2010, JCP
▪PPM-CT (overall 2nd
order): 2h42m50s
▪MP5 (5th order):
15s(x5/3)=25s
!
▪More computational
work & less memory
▪Better suited for HPC
▪Easier in FD; harder in FV
▪High-orders schemes are
better in preserving
solution accuracy on AMR
31. Numerical Oscillations
‣ In general, numerical (spurious) oscillations happen
‣near steep gradients (Gibbs’ oscillation)
‣lack of numerical dissipation (high-order schemes)
‣lack of numerical stability (Courant condition)
‣if present, numerical solution is invalid
‣Controlling oscillations is crucial for solution accuracy & stability
‣More complicated situations (see LeVeque):
‣carbuncle/even-odd decoupling instability (Quirk, 1992)
‣start-up error
‣slow-moving shocks (see next)
32. Numerical Oscillations
‣Slow-moving shocks:
‣unphysical oscillations can exponentially grow in time, especially near
strong and slowly moving shocks (Woodward & Colella, 1984)
‣Jin &Liu (1996); Donat & Marquina (1996); Karni & Canic (1997);Arora
& Roe (1997); Striba & Donat (2003); Lee (2010)
First-order Godunov method (Source: LeVeque)
33. PPM Oscillations for Slow-Moving Shocks
✓Standard 3rd order PPM suffers from
unphysical oscillations for MHD Brio-
Wu Shock Tube (Brio-Wu, 1988)
✓Fix is available by applying an upwind
slope limiter for PPM (Lee, 2010)
✓Upwind PPM behaves very similar to
WENO5, reducing oscillations!
standard PPM
(Colella & Woodward, 1984)
upwind PPM (Lee, 2010)
WENO5 (Jiang & Shu, 1996)
Bad!
Improved!
Reference solution
34. Traditional High-Order Schemes
‣Traditional approaches to get (N+1)th high-order schemes take Nth
degree polynomial
▪only for normal direction (e.g., FOG, MH, PPM, ENO,WENO, etc)
▪with monotonicity controls (e.g., slope limiters, artificial viscosity)
‣ High-order in FV is tricky (when compared to FD)
▪volume-averaged quantities (quadrature rules)
▪preserving conservation w/o losing accuracy
▪higher the order, larger the stencil
▪high-order temporal update (ODE solvers, e.g., RK3, RK4, etc.)
2D stencil for
2nd order PLM
2D stencil for
3rd order PPM
35. High-Order using Gaussian Processes (GP)
▪ Gaussian Processes (GP) are a class of a stochastic processes that yield
sampling data from a function that is probabilistically constrained,
but not exactly known
▪C. Graziani, P. Tzeferacos & D. Lee (Flash, U of Chicago)
▪Our high-order GP interpolation scheme is based on:
▪ samples (i.e., volume-averaged data points) of the function
▪ train the GP model on the samples by means of Bayes’ theorem
▪ the posterior mean function is our high-order interpolant of the
unknown function
▪The result is to pass from an “agnostic” prior model
(a mean function and a covariance kernel) to a data-informed
posterior model (an updated mean function and covariance)
36. Agnostic Prior Model
GP is defined through
(1) a mean function, and
(2) a symmetric positive-definite integral kernel K(x,y):
‣ Mean function
!
!
‣ Kernel (covariance function)
!
!
‣Write
!
!
‣The likelyhood function (the probability of f given the GP model)
37. ‣Want to predict an unknown function f probabilistically at a new
point x*
!
!
‣ Then the augmented likelyhood function is
!
!
!
where
Data-Informed Posterior Model
The result is to pass from an agnostic prior model (a mean
function and a covariance kernel) to a data-informed posterior
model (an updated mean function and covariance)
38. ‣ Bayes’ Theorem gives
Updated Mean Function
The result is to pass from an agnostic prior model (a mean
function and a covariance kernel) to a data-informed posterior
model (an updated mean function and covariance)
Our high-order
interpolated value:
a Gaussian probability
distribution on the
unknown function
value f*
39. Truly Multidimensional Use of Stencil
The current GP interpolation method in FLASH for smooth
flow tests. For this, we use square exponential (SE) convariance
and interpolate on “blocky sphere” of radius R
C1
‣ SE covariance has the property of having a native functions, thus can
provide with spectral convergence rates when the underlying approximated
function is itself C1
2D stencil for GP
2D stencil for
2nd order PLM 2D stencil for
3rd order PPM
40. Revisited:1D Mach 3 Shock
!
PLM on 1600
GP (spectral)
WENO-Z (5)
PPM (3)
PLM (2)
FOG (1)
41. Results on Smooth Flows 1
• 2D advection of an isentropic vortex along the domain
diagonal on a periodic box ( , )R = 2 = 6
42. Results on Smooth Flows II
• 1D advection of Gaussian profile ( , )= 12R = 2
44. Implicit Solver in Unsplit Hydro
‣Spatial parallelism & optimization (MPI, multi-threading, numerical
algorithm improvements, coding optimizations)
‣Temporal optimization:
▪overcome small diffusion (parabolic PDE) time scales
▪Jacobian-Free Newton-Krylov fully implicit solver (e.g., Knoll and
Keys, 2004;Toth et al., 2006) for unsplit hydro solver
▪ NSF grant (PHY-0903997), 2009-2012, $400K
▪ Dongwook Lee (PI), Guohua Xia (postdoc), Shravan Gopal &
Prateeti Mohapatra (scientists)
▪ GMRES with preconditioner
45. Future ApplicationsSUBMITTED TO APJ ON 2013 OCTOBER 21
Figure 13. Pseudo-color slices of entropy at fou
‣In Core-collapse SN, sound
speed at the core of the proto-
neutron star reaches up to ~1/3
of speed of light
‣This results in a very small
time step is the observed
regions, where there is no shock
‣Hybrid Time Stepping: Implicit
solutions at the core; explicit
solutions elsewhere will bring
huge computing acceleration for
CCSN
FLASH Simulation of a 3D Core-collapse Supernova
Courtesy of S. Couch
entropy
46. Summary
• Novel mathematical algorithms and ideas in designing a scientific code,
and performing state-of-art simulations are the most important key to
success in scientific computing on HPC
!
• High-order method is a good approach to embody the desired
tradeoff between memory and computation in future HPC
!
• Building a good large-scale scientific code with computational
accuracy, stability, efficiency, modularity, especially with multi-physics
capabilities requires to combine various research fields, including
mathematics, physics (and other fields of sciences) and computer
science
47. Future Work!
!
▪More high-order reconstruction methods
▪High-order quadrature rules for multi-dimensions
▪More studies on GP
▪ covariance kernels, applications on AMR
prolongations, more convergence studies, GP for
FD, etc.
▪High-order temporal integrations (RK3, RK4, etc.)
▪Hybrid explicit-implicit solvers
!
▪More fine-grained threading
▪Keep up with HPC trends
▪GPU accelerators
48. Type 1a SN (105M hr, 2013)
Shock-Generated Magnetic Fields (40M hr, 2013)
Turbulent Nuclear Combustion (150M hr, 2013)
FLASH’s Recent Computing Time Awards
Core-Collapse SN (30M hr, 2013)
51. High-Order Numerical Algorithms
‣ Provide more accurate numerical solutions using
▪ less grid points (=memory save)
▪ higher-order mathematical approximations (promoting
floating point operations, or computation)
▪ faster convergence to solution
52. Unsplit Hydro/MHD Solvers
‣Spatial Evolution (PDE evolution with Finite-volume):
‣Reconstruction Methods:
‣Polynomial-based:1st order Godunov, 2nd order PLM, 3rd order
PPM, 5th order WENO
‣Gaussian Process model-based: spectral-order GP (very new!)
‣Riemann Solvers:
‣Rusanov, HLLE, HLLC, HLLD (MHD), Marquina, Roe, Hybrid
‣Temporal Evolution (ODE evolution):
‣2nd-order characteristic tracing (predictor-corrector type) method