In this deck from the GPU Technology Conference, Wes Armour from the Oxford eResearch Centre discusses the role of GPUs in processing large amounts of astronomical data collected by the Square Kilometre Array and how CUDA is the best suited option for their signal processing software.
During his session at GTC 2019, Armour talked about AstroAccelerate, a GPU enabled software package that uses CUDA and NVIDIA GPUs to achieve real-time processing of radio-astronomy data. He stated that “The massive computational power of modern day GPUs allows code to perform algorithms such as de-dispersion, single pulse searching and Fourier Domain Acceleration Searching in real-time on very large data-sets which are comparable to those which will be produced by next generation radio-telescopes such as the SKA.”
Watch the video: https://wp.me/p3RLHQ-kBv
Learn more: https://www.skatelescope.org/the-ska-project/
and
https://www.nvidia.com/en-us/gtc/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
www.modusrobotics.com.
Hope you like this presentation about LiDAR flight plannings. Contact Modus Robotics if you've more questions about Flight Panning, Point Density, and Swath Panning etc.
Modus Robotics helps various organizations to collect, analyze and transform data into Actionable Information for Rapid Decision Making.
Hope you like this presentation about LiDAR flight plannings. Contact Modus Robotics if you've more questions about Flight Panning, Point Density, and Swath Panning etc.
Modus Robotics helps various organizations to collect, analyze and transform data into Actionable Information for Rapid Decision Making.
www.modusrobotics.com
www.modusrobotics.com.
Hope you like this presentation about LiDAR flight plannings. Contact Modus Robotics if you've more questions about Flight Panning, Point Density, and Swath Panning etc.
Modus Robotics helps various organizations to collect, analyze and transform data into Actionable Information for Rapid Decision Making.
Hope you like this presentation about LiDAR flight plannings. Contact Modus Robotics if you've more questions about Flight Panning, Point Density, and Swath Panning etc.
Modus Robotics helps various organizations to collect, analyze and transform data into Actionable Information for Rapid Decision Making.
www.modusrobotics.com
Presentation used at Mobile Ghent 2013 for the paper ""Mobility collector: Battery Conscious Mobile Tracking"
Paper link: http://www.tandfonline.com/doi/full/10.1080/17489725.2014.973917
At SIGGRAPH 2018, Colin Barré-Brisebois presented PICA PICA running on NVIDIA's new Turing architecture, with performance comparisons with Volta. Developed by Henrik Halén of SEED a technique for real-time raytraced transparent shadows was also presented, as well as an experiment with rough glass.
Parametric Time Domain Method for separation of Cloud and Drizzle for ARM Clo...Pratik Ramdasi
Presentation describes Parametric Time Domain Method (PTDM) to separate cloud and drizzle moments for the W-band ARM cloud radar located at Graciosa Island, Portugal.
Earth Viewing Systems Satellite Sensor Project, for Professor DiNardo's Course.
The presentation was given on 14th May, 2009.
______________________________________
I realize that some of the graphics do not have their sources cited, but I did not make those slides, and the group members who made them did not remember their sources. So, please forgive this oversight, since I consider it important enough to students of the earth surveillance class at The City College of New York (and elsewhere) that old presentations be available to them.
If, however, you can give me the sources of the graphics that you see, then I will be grateful, and I will be happy to cite them.
Test time efficient group delay filter characterization technique using a dis...Pete Sarson, PH.D
To measure a filter’s group delay in production is never an easy task and to measure the group delay characteristic in quick and timely manner is difficult to say the least. This paper will discuss a simple method that expands on the author’s previous works that will demonstrate how to measure the group delay of a filter and how accurately the technique correlates to measurements of the silicon performance made in the lab. The test time saving and stability of results will be shown as well as the advantages of the technique with regard to having full characterization data available in a production program. Finally, it will be shown how the work can be further developed to a potentially more efficient technique.
Behavioral modeling of Clock/Data RecoveryArrow Devices
Clock/Data recovery (CDR) is a tricky logic to implement correctly. To verify the clock/data recovery logic implemented in designs, the corresponding verification infrastructure needs to be modeled correctly.
This presentation aims to present the various issues faced for modeling CDR behaviorally along with their solutions.
Want to know what's new with these topics? The PROSSATeam is developing projects and searching for new ideas. Take a look of our progress in Space Wether and these topics!
Presentation used at Mobile Ghent 2013 for the paper ""Mobility collector: Battery Conscious Mobile Tracking"
Paper link: http://www.tandfonline.com/doi/full/10.1080/17489725.2014.973917
At SIGGRAPH 2018, Colin Barré-Brisebois presented PICA PICA running on NVIDIA's new Turing architecture, with performance comparisons with Volta. Developed by Henrik Halén of SEED a technique for real-time raytraced transparent shadows was also presented, as well as an experiment with rough glass.
Parametric Time Domain Method for separation of Cloud and Drizzle for ARM Clo...Pratik Ramdasi
Presentation describes Parametric Time Domain Method (PTDM) to separate cloud and drizzle moments for the W-band ARM cloud radar located at Graciosa Island, Portugal.
Earth Viewing Systems Satellite Sensor Project, for Professor DiNardo's Course.
The presentation was given on 14th May, 2009.
______________________________________
I realize that some of the graphics do not have their sources cited, but I did not make those slides, and the group members who made them did not remember their sources. So, please forgive this oversight, since I consider it important enough to students of the earth surveillance class at The City College of New York (and elsewhere) that old presentations be available to them.
If, however, you can give me the sources of the graphics that you see, then I will be grateful, and I will be happy to cite them.
Test time efficient group delay filter characterization technique using a dis...Pete Sarson, PH.D
To measure a filter’s group delay in production is never an easy task and to measure the group delay characteristic in quick and timely manner is difficult to say the least. This paper will discuss a simple method that expands on the author’s previous works that will demonstrate how to measure the group delay of a filter and how accurately the technique correlates to measurements of the silicon performance made in the lab. The test time saving and stability of results will be shown as well as the advantages of the technique with regard to having full characterization data available in a production program. Finally, it will be shown how the work can be further developed to a potentially more efficient technique.
Behavioral modeling of Clock/Data RecoveryArrow Devices
Clock/Data recovery (CDR) is a tricky logic to implement correctly. To verify the clock/data recovery logic implemented in designs, the corresponding verification infrastructure needs to be modeled correctly.
This presentation aims to present the various issues faced for modeling CDR behaviorally along with their solutions.
Want to know what's new with these topics? The PROSSATeam is developing projects and searching for new ideas. Take a look of our progress in Space Wether and these topics!
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Astronomical Data Processing on the LSST Scale with Apache SparkDatabricks
The next decade promises to be exciting for both astronomy and computer science with a number of large-scale astronomical surveys in preparation. One of the most important ones is Large Scale Survey Telescope, or LSST. LSST will produce the first ‘video’ of the deep sky in history by continually scanning the visible sky and taking one 3.2 giga-pixel image every 20 seconds. In this talk we will describe LSST’s unique design and how its image processing pipeline produces catalogs of astronomical objects. To process and quickly cross-match catalog data we built AXS (Astronomy Extensions for Spark), a system based on Apache Spark. We will explain its design and what is behind its great cross-matching performance.
How e-Science tools are needed for the new data intensive science, specifically targeted to the Square Kilometre Array. Talk given at the Special Symposium 15 on Data Intensive Astronomy, held during the General Assembly Meeting of the International Astronomical Union in Bejing, 2012.
The Square Kilometre Array is currently undergoing the Preliminary Design Reviews for its composing elements, and is thus at a critical point on its way to becoming ready for construction starting in 2018. In this talk we will provide an overview of the SKA, its composing elements, and their status, with emphasis on the Telescope Manager and the Science Data Processor, respectively the Monitoring & Control system, and Pipeline. We will see how do they compare with their ALMA equivalents, and how is the SKA similar/different from ALMA.
Distance measuring unit with zigbee protocol, Ultra sonic sensorAshok Raj
With Zigbee protocol, developed a distance measurement unit using an ultrasonic sensor, Arduino and X-bee trans-receiver for communication between displays and monitoring unit.
Software used: Arduino
Accelerating Astronomical Discoveries with Apache SparkDatabricks
Our research group is investigating how to leverage Apache Spark (batch, streaming & real-time) to analyse current and future data sets in astronomy. Among the future large experiments, the Large Synoptic Survey Telescope (LSST) will start soon collecting terabytes of data per observation night, and the efficient processing and analysis of both real-time and historical data remains a major challenge. In this talk we will expose the main challenges and explore the latest developments tailored for big data problems in astronomy.
On the one hand we designed a new Data Source API extension to natively manipulate telescope images and astronomical tables within Apache Spark. We then extended the functionalities of the Apache Spark SQL module to ease the manipulation of 3D data sets and perform efficient queries: partitioning, data sets join and cross-match, nearest neighbors search, spatial queries, and more.
On the other hand we are using the new possibilities offered by Structured Streaming APIs in recent Apache Spark versions to enable real-time decisions by rapidly accessing and analysing the alerts sent by telescopes every night. Given the unprecedented precision of next generation of telescopes, the streams of alerts will be made of millions of alerts per night, and relying on Structured Streaming is a guarantee of not missing the latest Black Hole event in a sea of data! We will also share active learning developments used on top to improve real-time event selection and classification for the LSST telescope.
You will walk away with an understanding of modern challenges in astronomy, appreciate some beautiful night skies, and how Apache Spark can help pushing further the frontiers of Science!
Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...EarthCube
Talk at the EarthCube End-User Domain Workshop for Rock Deformation and Mineral Physics Research.
By Martin Kunz, Lawrence Berkeley National Laboratory
In this video from PASC18, Alexander Nitz from the Max Planck Institute for Gravitational Physics in Germany presents: The Search for Gravitational Waves.
"The LIGO and Virgo detectors have completed a prolific observation run. We are now observing gravitational waves from both the mergers of binary black holes and neutron stars. We’ll discuss how these discoveries were made and look into what the near future of searching for gravitational waves from compact binary mergers will look like."
Watch the video: https://wp.me/p3RLHQ-iTv
Learn more: github.com/gwastro/pycbc
and
https://pasc18.pasc-conference.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Burst data retrieval after 50k GPU Cloud runIgor Sfiligoi
We ran a 50k GPU multi-cloud simulation to support the IceCube science. This talk provided an overview of what happened to the associated data.
Presented at the Internet2 booth at SC19.
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...InfluxData
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB Helps Vera C. Rubin Observatory Make the Deepest, Widest Image of the Universe | InfluxDays Virtual Experience NA 2020
Similar to AstroAccelerate - GPU Accelerated Signal Processing on the Path to the Square Kilometre Array (20)
In this deck from the Stanford HPC Conference, Shahin Khan from OrionX describes major market Shifts in IT.
"We will discuss the digital infrastructure of the future enterprise and the state of these trends."
"We work with clients on the impact of Digital Transformation (DX) on them, their customers, and their messages. Generally, they want to track, in one place, trends like IoT, 5G, AI, Blockchain, and Quantum Computing. And they want to know what these trends mean, how they affect each other, and when they demand action, and how to formulate and execute an effective plan. If that describes you, we can help."
Watch the video: https://wp.me/p3RLHQ-lPP
Learn more: http://orionx.net
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
In this deck from IWOCL / SYCLcon 2020, Hal Finkel from Argonne National Laboratory presents: Preparing to program Aurora at Exascale - Early experiences and future directions.
"Argonne National Laboratory’s Leadership Computing Facility will be home to Aurora, our first exascale supercomputer. Aurora promises to take scientific computing to a whole new level, and scientists and engineers from many different fields will take advantage of Aurora’s unprecedented computational capabilities to push the boundaries of human knowledge. In addition, Aurora’s support for advanced machine-learning and big-data computations will enable scientific workflows incorporating these techniques along with traditional HPC algorithms. Programming the state-of-the-art hardware in Aurora will be accomplished using state-of-the-art programming models. Some of these models, such as OpenMP, are long-established in the HPC ecosystem. Other models, such as Intel’s oneAPI, based on SYCL, are relatively-new models constructed with the benefit of significant experience. Many applications will not use these models directly, but rather, will use C++ abstraction libraries such as Kokkos or RAJA. Python will also be a common entry point to high-performance capabilities. As we look toward the future, features in the C++ standard itself will become increasingly relevant for accessing the extreme parallelism of exascale platforms.
This presentation will summarize the experiences of our team as we prepare for Aurora, exploring how to port applications to Aurora’s architecture and programming models, and distilling the challenges and best practices we’ve developed to date. oneAPI/SYCL and OpenMP are both critical models in these efforts, and while the ecosystem for Aurora has yet to mature, we’ve already had a great deal of success. Importantly, we are not passive recipients of programming models developed by others. Our team works not only with vendor-provided compilers and tools, but also develops improved open-source LLVM-based technologies that feed both open-source and vendor-provided capabilities. In addition, we actively participate in the standardization of OpenMP, SYCL, and C++. To conclude, I’ll share our thoughts on how these models can best develop in the future to support exascale-class systems."
Watch the video: https://wp.me/p3RLHQ-lPT
Learn more: https://www.iwocl.org/iwocl-2020/conference-program/
and
https://www.anl.gov/topic/aurora
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck, Greg Wahl from Advantech presents: Transforming Private 5G Networks.
Advantech Networks & Communications Group is driving innovation in next-generation network solutions with their High Performance Servers. We provide business critical hardware to the world's leading telecom and networking equipment manufacturers with both standard and customized products. Our High Performance Servers are highly configurable platforms designed to balance the best in x86 server-class processing performance with maximum I/O and offload density. The systems are cost effective, highly available and optimized to meet next generation networking and media processing needs.
“Advantech’s Networks and Communication Group has been both an innovator and trusted enabling partner in the telecommunications and network security markets for over a decade, designing and manufacturing products for OEMs that accelerate their network platform evolution and time to market.” Said Advantech Vice President of Networks & Communications Group, Ween Niu. “In the new IP Infrastructure era, we will be expanding our expertise in Software Defined Networking (SDN) and Network Function Virtualization (NFV), two of the essential conduits to 5G infrastructure agility making networks easier to install, secure, automate and manage in a cloud-based infrastructure.”
In addition to innovation in air interface technologies and architecture extensions, 5G will also need a new generation of network computing platforms to run the emerging software defined infrastructure, one that provides greater topology flexibility, essential to deliver on the promises of high availability, high coverage, low latency and high bandwidth connections. This will open up new parallel industry opportunities through dedicated 5G network slices reserved for specific industries dedicated to video traffic, augmented reality, IoT, connected cars etc. 5G unlocks many new doors and one of the keys to its enablement lies in the elasticity and flexibility of the underlying infrastructure.
Advantech’s corporate vision is to enable an intelligent planet. The company is a global leader in the fields of IoT intelligent systems and embedded platforms. To embrace the trends of IoT, big data, and artificial intelligence, Advantech promotes IoT hardware and software solutions with the Edge Intelligence WISE-PaaS core to assist business partners and clients in connecting their industrial chains. Advantech is also working with business partners to co-create business ecosystems that accelerate the goal of industrial intelligence."
Watch the video: https://wp.me/p3RLHQ-lPQ
* Company website: https://www.advantech.com/
* Solution page: https://www2.advantech.com/nc/newsletter/NCG/SKY/benefits.html
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
In this deck from the Stanford HPC Conference, Katie Lewis from Lawrence Livermore National Laboratory presents: The Incorporation of Machine Learning into Scientific Simulations at Lawrence Livermore National Laboratory.
"Scientific simulations have driven computing at Lawrence Livermore National Laboratory (LLNL) for decades. During that time, we have seen significant changes in hardware, tools, and algorithms. Today, data science, including machine learning, is one of the fastest growing areas of computing, and LLNL is investing in hardware, applications, and algorithms in this space. While the use of simulations to focus and understand experiments is well accepted in our community, machine learning brings new challenges that need to be addressed. I will explore applications for machine learning in scientific simulations that are showing promising results and further investigation that is needed to better understand its usefulness."
Watch the video: https://youtu.be/NVwmvCWpZ6Y
Learn more: https://computing.llnl.gov/research-area/machine-learning
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
In this deck from the Stanford HPC Conference, DK Panda from Ohio State University presents: How to Achieve High-Performance, Scalable and Distributed DNN Training on Modern HPC Systems?
"This talk will start with an overview of challenges being faced by the AI community to achieve high-performance, scalable and distributed DNN training on Modern HPC systems with both scale-up and scale-out strategies. After that, the talk will focus on a range of solutions being carried out in my group to address these challenges. The solutions will include: 1) MPI-driven Deep Learning, 2) Co-designing Deep Learning Stacks with High-Performance MPI, 3) Out-of- core DNN training, and 4) Hybrid (Data and Model) parallelism. Case studies to accelerate DNN training with popular frameworks like TensorFlow, PyTorch, MXNet and Caffe on modern HPC systems will be presented."
Watch the video: https://youtu.be/LeUNoKZVuwQ
Learn more: http://web.cse.ohio-state.edu/~panda.2/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
In this deck from the Stanford HPC Conference, Nick Nystrom and Paola Buitrago provide an update from the Pittsburgh Supercomputing Center.
Nick Nystrom is Chief Scientist at the Pittsburgh Supercomputing Center (PSC). Nick is architect and PI for Bridges, PSC's flagship system that successfully pioneered the convergence of HPC, AI, and Big Data. He is also PI for the NIH Human Biomolecular Atlas Program’s HIVE Infrastructure Component and co-PI for projects that bring emerging AI technologies to research (Open Compass), apply machine learning to biomedical data for breast and lung cancer (Big Data for Better Health), and identify causal relationships in biomedical big data (the Center for Causal Discovery, an NIH Big Data to Knowledge Center of Excellence). His current research interests include hardware and software architecture, applications of machine learning to multimodal data (particularly for the life sciences) and to enhance simulation, and graph analytics.
Watch the video: https://youtu.be/LWEU1L1o7yY
Learn more: https://www.psc.edu/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the Stanford HPC Conference, Ryan Quick from Providentia Worldwide describes how DNNs can be used to improve EDA simulation runs.
"Systems Intelligence relies on a variety of methods for providing insight into the core mechanisms for driving automated behavioral changes in self-healing command and control platforms. This talk reports on initial efforts with leveraging Semiconductor Electronic Design Automation (EDA) telemetry data from cross-domain sources including power, network, storage, nodes, and applications in neural networks as a driving method for insight into SI automation systems."
Watch the video: https://youtu.be/2WbR8tq-XbM
Learn more: http://www.providentiaworldwide.com/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
In this deck from the Stanford HPC Conference, Nicole Xu from Stanford University describes how she transformed a common jellyfish into a bionic creature that is part animal and part machine.
"Animal locomotion and bioinspiration have the potential to expand the performance capabilities of robots, but current implementations are limited. Mechanical soft robots leverage engineered materials and are highly controllable, but these biomimetic robots consume more power than corresponding animal counterparts. Biological soft robots from a bottom-up approach offer advantages such as speed and controllability but are limited to survival in cell media. Instead, biohybrid robots that comprise live animals and self- contained microelectronic systems leverage the animals’ own metabolism to reduce power constraints and body as an natural scaffold with damage tolerance. We demonstrate that by integrating onboard microelectronics into live jellyfish, we can enhance propulsion up to threefold, using only 10 mW of external power input to the microelectronics and at only a twofold increase in cost of transport to the animal. This robotic system uses 10 to 1000 times less external power per mass than existing swimming robots in literature and can be used in future applications for ocean monitoring to track environmental changes."
Watch the video: https://youtu.be/HrmJFyvInj8
Learn more: https://sanfrancisco.cbslocal.com/2020/02/05/stanford-research-project-common-jellyfish-bionic-sea-creatures/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the Stanford HPC Conference, Peter Dueben from the European Centre for Medium-Range Weather Forecasts (ECMWF) presents: Machine Learning for Weather Forecasts.
"I will present recent studies that use deep learning to learn the equations of motion of the atmosphere, to emulate model components of weather forecast models and to enhance usability of weather forecasts. I will than talk about the main challenges for the application of deep learning in cutting-edge weather forecasts and suggest approaches to improve usability in the future."
Peter is contributing to the development and optimization of weather and climate models for modern supercomputers. He is focusing on a better understanding of model error and model uncertainty, on the use of reduced numerical precision that is optimised for a given level of model error, on global cloud- resolving simulations with ECMWF's forecast model, and the use of machine learning, and in particular deep learning, to improve the workflow and predictions. Peter has graduated in Physics and wrote his PhD thesis at the Max Planck Institute for Meteorology in Germany. He worked as Postdoc with Tim Palmer at the University of Oxford and has taken up a position as University Research Fellow of the Royal Society at the European Centre for Medium-Range Weather Forecasts (ECMWF) in 2017.
Watch the video: https://youtu.be/ks3fkRj8Iqc
Learn more: https://www.ecmwf.int/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck, Gilad Shainer from the HPC AI Advisory Council describes how this organization fosters innovation in the high performance computing community.
"The HPC-AI Advisory Council’s mission is to bridge the gap between high-performance computing (HPC) and Artificial Intelligence (AI) use and its potential, bring the beneficial capabilities of HPC and AI to new users for better research, education, innovation and product manufacturing, bring users the expertise needed to operate HPC and AI systems, provide application designers with the tools needed to enable parallel computing, and to strengthen the qualification and integration of HPC and AI system products."
Watch the video: https://wp.me/p3RLHQ-lNz
Learn more: http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Today RIKEN in Japan announced that the Fugaku supercomputer will be made available for research projects aimed to combat COVID-19.
"Fugaku is currently being installed and is scheduled to be available to the public in 2021. However, faced with the devastating disaster unfolding before our eyes, RIKEN and MEXT decided to make a portion of the computational resources of Fugaku available for COVID-19-related projects ahead of schedule while continuing the installation process.
Fugaku is being developed not only for the progress in science, but also to help build the society dubbed as the “Society 5.0” by the Japanese government, where all people will live safe and comfortable lives. The current initiative to fight against the novel coronavirus is driven by the philosophy behind the development of Fugaku."
Initial Projects
Exploring new drug candidates for COVID-19 by "Fugaku"
Yasushi Okuno, RIKEN / Kyoto University
Prediction of conformational dynamics of proteins on the surface of SARS-Cov-2 using Fugaku
Yuji Sugita, RIKEN
Simulation analysis of pandemic phenomena
Nobuyasu Ito, RIKEN
Fragment molecular orbital calculations for COVID-19 proteins
Yuji Mochizuki, Rikkyo University
In this deck from the Performance Optimisation and Productivity group, Lubomir Riha from IT4Innovations presents: Energy Efficient Computing using Dynamic Tuning.
"We now live in a world of power-constrained architectures and systems and power consumption represents a significant cost factor in the overall HPC system economy. For these reasons, in recent years researchers, supercomputing centers and major vendors have developed new tools and methodologies to measure and optimize the energy consumption of large-scale high performance system installations. Due to the link between energy consumption, power consumption and execution time of an application executed by the final user, it is important for these tools and the methodology used to consider all these aspects, empowering the final user and the system administrator with the capability of finding the best configuration given different high level objectives.
This webinar focused on tools designed to improve the energy-efficiency of HPC applications using a methodology of dynamic tuning of HPC applications, developed under the H2020 READEX project. The READEX methodology has been designed for exploiting the dynamic behaviour of software. At design time, different runtime situations (RTS) are detected and optimized system configurations are determined. RTSs with the same configuration are grouped into scenarios, forming the tuning model. At runtime, the tuning model is used to switch system configurations dynamically.
The MERIC tool, that implements the READEX methodology, is presented. It supports manual or binary instrumentation of the analysed applications to simplify the analysis. This instrumentation is used to identify and annotate the significant regions in the HPC application. Automatic binary instrumentation annotates regions with significant runtime. Manual instrumentation, which can be combined with automatic, allows code developer to annotate regions of particular interest."
Watch the video: https://wp.me/p3RLHQ-lJP
Learn more: https://pop-coe.eu/blog/14th-pop-webinar-energy-efficient-computing-using-dynamic-tuning
and
https://code.it4i.cz/vys0053/meric
Sign up for our insideHPC Newsletter: http://insidehpc.com/newslett
In this deck from GTC Digital, William Beaudin from DDN presents: HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD.
Enabling high performance computing through the use of GPUs requires an incredible amount of IO to sustain application performance. We'll cover architectures that enable extremely scalable applications through the use of NVIDIA’s SuperPOD and DDN’s A3I systems.
The NVIDIA DGX SuperPOD is a first-of-its-kind artificial intelligence (AI) supercomputing infrastructure. DDN A³I with the EXA5 parallel file system is a turnkey, AI data storage infrastructure for rapid deployment, featuring faster performance, effortless scale, and simplified operations through deeper integration. The combined solution delivers groundbreaking performance, deploys in weeks as a fully integrated system, and is designed to solve the world's most challenging AI problems.
Watch the video: https://wp.me/p3RLHQ-lIV
Learn more: https://www.ddn.com/download/nvidia-superpod-ddn-a3i-ai400-appliance-with-the-exa5-filesystem/
and
https://www.nvidia.com/en-us/gtc/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck, Paul Isaacs from Linaro presents: State of ARM-based HPC. This talk provides an overview of applications and infrastructure services successfully ported to Aarch64 and benefiting from scale.
"With its debut on the TOP500, the 125,000-core Astra supercomputer at New Mexico’s Sandia Labs uses Cavium ThunderX2 chips to mark Arm’s entry into the petascale world. In Japan, the Fujitsu A64FX Arm-based CPU in the pending Fugaku supercomputer has been optimized to achieve high-level, real-world application performance, anticipating up to one hundred times the application execution performance of the K computer. K was the first computer to top 10 petaflops in 2011."
Watch the video: https://wp.me/p3RLHQ-lIT
Learn more: https://www.linaro.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
Today Xilinx announced Versal Premium, the third series in the Versal ACAP portfolio. The Versal Premium series features highly integrated, networked and power-optimized cores and the industry’s highest bandwidth and compute density on an adaptable platform. Versal Premium is designed for the highest bandwidth networks operating in thermally and spatially constrained environments, as well as for cloud providers who need scalable, adaptable application acceleration.
Versal is the industry’s first adaptive compute acceleration platform (ACAP), a revolutionary new category of heterogeneous compute devices with capabilities that far exceed those of conventional silicon architectures. Developed on TSMC’s 7-nanometer process technology, Versal Premium combines software programmability with dynamically configurable hardware acceleration and pre-engineered connectivity and security features to enable a faster time-to- market. The Versal Premium series delivers up to 3X higher throughput compared to current generation FPGAs, with built-in Ethernet, Interlaken, and cryptographic engines that enable fast and secure networks. The series doubles the compute density of currently deployed mainstream FPGAs and provides the adaptability to keep pace with increasingly diverse and evolving cloud and networking workloads.
Learn more: https://insidehpc.com/2020/03/xilinx-announces-versal-premium-acap-for-network-and-cloud-acceleration/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
In this video from the Rice Oil & Gas Conference, Chin Fang from Zettar presents: Moving Massive Amounts of Data across Any Distance Efficiently.
The objective of this talk is to present two on-going projects aiming at improving and ensuring highly efficient bulk transferring or streaming of massive amounts of data over digital connections across any distance. It examines the current state of the art, a few very common misconceptions, the differences among the three major type of data movement solutions, a current initiative attempting to improve the data movement efficiency from the ground up, and another multi-stage project that shows how to conduct long distance large scale data movement at speed and scale internationally. Both projects have real world motivations, e.g. the ambitious data transfer requirements of Linac Coherent Light Source II (LCLS-II) [1], a premier preparation project of the U.S. DOE Exascale Computing Initiative (ECI) [2]. Their immediate goals are described and explained, together with the solution used for each. Findings and early results are reported. Possible future works are outlined.
Watch the video: https://wp.me/p3RLHQ-lBX
Learn more: https://www.zettar.com/
and
https://rice2020oghpc.rice.edu/program-2/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the Rice Oil & Gas Conference, Bradley McCredie from AMD presents: Scaling TCO in a Post Moore's Law Era.
"While foundries bravely drive forward to overcome the technical and economic challenges posed by scaling to 5nm and beyond, Moore’s law alone can provide only a fraction of the performance / watt and performance / dollar gains needed to satisfy the demands of today’s high performance computing and artificial intelligence applications. To close the gap, multiple strategies are required. First, new levels of innovation and design efficiency will supplement technology gains to continue to deliver meaningful improvements in SoC performance. Second, heterogenous compute architectures will create x-factor increases of performance efficiency for the most critical applications. Finally, open software frameworks, APIs, and toolsets will enable broad ecosystems of application level innovation."
Watch the video:
Learn more: http://amd.com
and
https://rice2020oghpc.rice.edu/program-2/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
In this deck from the ECSS Symposium, Abe Stern from NVIDIA presents: CUDA-Python and RAPIDS for blazing fast scientific computing.
"We will introduce Numba and RAPIDS for GPU programming in Python. Numba allows us to write just-in-time compiled CUDA code in Python, giving us easy access to the power of GPUs from a powerful high-level language. RAPIDS is a suite of tools with a Python interface for machine learning and dataframe operations. Together, Numba and RAPIDS represent a potent set of tools for rapid prototyping, development, and analysis for scientific computing. We will cover the basics of each library and go over simple examples to get users started. Finally, we will briefly highlight several other relevant libraries for GPU programming."
Watch the video: https://wp.me/p3RLHQ-lvu
Learn more: https://developer.nvidia.com/rapids
and
https://www.xsede.org/for-users/ecss/ecss-symposium
Sign up for our insideHPC Newsletter: http://insidehp.com/newsletter
In this deck from FOSDEM 2020, Colin Sauze from Aberystwyth University describes the development of a RaspberryPi cluster for teaching an introduction to HPC.
"The motivation for this was to overcome four key problems faced by new HPC users:
* The availability of a real HPC system and the effect running training courses can have on the real system, conversely the availability of spare resources on the real system can cause problems for the training course.
* A fear of using a large and expensive HPC system for the first time and worries that doing something wrong might damage the system.
* That HPC systems are very abstract systems sitting in data centres that users never see, it is difficult for them to understand exactly what it is they are using.
* That new users fail to understand resource limitations, in part because of the vast resources in modern HPC systems a lot of mistakes can be made before running out of resources. A more resource constrained system makes it easier to understand this.
The talk will also discuss some of the technical challenges in deploying an HPC environment to a Raspberry Pi and attempts to keep that environment as close to a "real" HPC as possible. The issue to trying to automate the installation process will also be covered."
Learn more: https://github.com/colinsauze/pi_cluster
and
https://fosdem.org/2020/schedule/events/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from ATPESC 2019, Ken Raffenetti from Argonne presents an overview of HPC interconnects.
"The Argonne Training Program on Extreme-Scale Computing (ATPESC) provides intensive, two-week training on the key skills, approaches, and tools to design, implement, and execute computational science and engineering applications on current high-end computing systems and the leadership-class computing systems of the future."
Watch the video: https://wp.me/p3RLHQ-luc
Learn more: https://extremecomputingtraining.anl.gov/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
AstroAccelerate - GPU Accelerated Signal Processing on the Path to the Square Kilometre Array
1. www.oerc.ox.ac.uk
AstroAccelerate
GPU accelerated signal processing on the path to the
Square Kilometre Array
Wes Armour, Karel Adamek,
Sofia Dimoudi, Jan Novotny, Nassim Ouannough, Cees Carels
Oxford e-Research Centre,
Department of Engineering Science
University of Oxford
20th March 2019
3. What is SKA?
What does SKA stand for?
Square Kilometre Array, so called because it
will have an effective collecting area of a
square kilometre.
What is SKA?
SKA is a ground based radio telescope that
will span continents.
Where will SKA be located?
SKA will be built in South Africa and
Australia.
Core
Station
Graphic courtesy of Anne Trefethen
Example of
proposed SKA
configuration
4. SKA science
SKA will study a wide range of science cases
and aims to answer some of the fundamental
questions mankind has about the universe we
live in.
• How do galaxies evolve
– What is dark energy?
• Tests of General Relativity
– Was Einstein correct?
• Probing the cosmic dawn
– How did stars form?
• The cradle of life
– Are we alone in the Universe?
6. https://commons.wikimedia.org/wiki/File:Planets_and_sun_size_comparison.jpg (Author: Lsmpascal)
Sun
Pulsars – size and scale
Earth
Pulsars are magnetized, rotating neutron
stars which emit synchrotron radiation from
their poles (Crab Nebula). They are typically
1-3 Solar masses in size, have a diameter of
10-20 Kilometres and a pulse period
ranging from milliseconds to seconds.
Their magnetic field is offset from the axis
of rotation so we observe them as cosmic
lighthouses.
Pulsar
Amherst College
Hesteret.al.
7. Credit: FRB110220 Dan Thornton (Manchester)
SKA time domain science - Fast Radio Bursts
Fast Radio Bursts (FRBs), were first
discovered in 2005 by Lorimer et al.
They are observed as extremely
bright single pulses that are
extremely dispersed (meaning that
they are likely to be far away, maybe
extra galactic).
So far around 15 have been observed
in survey data. They are of unknown
origin, but likely to represent some of
the most extreme physics in our
Universe.
Hence they are extremely interesting
objects to study.Time
Frequency
9. SKA time domain - data rates
The SKA will produce vast amounts of data. In the
case of time-domain science we expect the
telescope to be able to place ~2000 observing
beams on the sky at any one time (there are
trivially parallel to compute).
The telescope will take 20,000 samples per
second for each of those beams and then it will
measure power in 4096 frequency channels for
each time sample. Each of those individual
samples will comprise of 4x8 bits, although we
are only really interested in one of the 8 bits of
information.
Doing the math tells us that we will need to
process 160GB/s of relevant data. This is
approximately equal to analysing 50 hours of HD
television data per second.
The most costly computational operations
in data processing pipeline are
DDTR ~ O(ndms * nbeams * nsamps * nchans )
FDAS ~ O(ndms * nbeams * nsamps * nacc * log(nsamps) * 1/tobs )
Requiring ~2 PetaFLOP of Compute!
10. SKA time domain - signal processing
The time domain team is an
international team led by Oxford
and Manchester.
It aims to deliver an end-to-end
signal processing pipeline for
time domain science performed
by SKA (see right).
Our work at OeRC has
focussed on vertical prototyping
activities. We are interested in
using many-core technologies,
such as GPUs to perform the
processing steps within the
signal processing pipeline with
the aim of achieving real-time
processing for the SKA. Image courtesy of Aris Karastergiou
Time Domain Team
Search for periodic signals
search for fast radio bursts
12. AstroAccelerate
AstroAccelerate is a GPU enabled
software package that focuses on
achieving real-time processing of
time-domain radio-astronomy data. It
uses the CUDA programming
language for NVIDIA GPUs.
The massive computational power of
modern day GPUs allows the code to
perform algorithms such as de-
dispersion, single pulse searching and
Fourier Domain Acceleration
Searching in real-time on very large
data-sets which are comparable to
those which will be produced by next
generation radio-telescopes such as
the SKA. https://github.com/AstroAccelerateOrg/astro-accelerate
13. AstroAccelerate - Signal Processing
De-dispersion
Periodicity Search
Harmonic Sum
(Deep dive two)
Fourier Domain Acceleration search
Single Pulse Search
(Deep dive one)
Radio Frequency Interference Mitigation
14. AstroAccelerate - API
• API follows a simple pattern: configure, bind, run.
• Select which pipeline modules to run, configure module plan, then bind plan to the API.
• API calculates the strategy with the optimal configuration for the plan.
• When all strategy objects are ready, the user selected modules are run within a pipeline.
Select
pipeline
modules Configure
module
plans
bind plan to
API
API
calculates
optimal
strategy
Run
pipeline
Bind input
data to API
C++
/Python
Cees Carels
15. AstroAccelerate - Code Features
• Usable as a library (.so) and/or standalone executable.
• Examples with instructions on how to compile and link.
• Regular releases (semantic versioning).
• CMake build system.
• Full doxygen documentation and readme.
• Automated CI, unit tests.
Cees Carels
18. Single Pulse Search
Single pulse search (SPS) could be done through matched filters these
are very sensitive but has problem with “quickly”.
Using a Boxcar filter for the single pulse search (SPS):
• Allows us to reuse data
• Independent of pulse shape
• We can trade sensitivity for performance
• Less sensitive by design
Aim is to detect pulses of different shapes and widths at unknown position
within the signal and do it quickly.
19. Single Pulse Search: How to detect pulses with boxcars
Position of the boxcar is important
We quantify coverage of the pulses by the
distance between boxcar filters L.
• Pulse may end up between boxcars
• By decreasing L we cover pulses better
SNR is
• Increased by adding signal
• Decreased by adding noise
Signal’s strength is measured as signal-to-noise ratio (SNR)
𝑆𝑁𝑅 =
𝑥 − 𝜇
𝜎
,
Where 𝑥 is the sample value, 𝜇 is the mean and 𝜎 is the standard deviation.
20. Single Pulse Search: How to detect pulses with boxcars
Boxcar which is:
• too short does not cover pulse fully
• too long does add unnecessary noise
We need different boxcar widths W to
better detect different pulse widths.
SNR is
• Increased by adding signal
• Decreased by adding noise
Width of the boxcar filter is also
important
21. Single Pulse Search: What do we need to do?
Summary:
• Position of the boxcar relative to the
pulse is important. This is expressed by
the distance between boxcars L.
• Boxcar width W is important for
detection of pulses with different
widths.
For ideal detection we need to do:
at every point
Output:
Highest SNR detected at given sample.
• We do not need to keep values of all
boxcar filters just highest SNR!
22. Single Pulse Search: Two algorithms
BoxDIT
• Starts from ideal Boxcar filter
• Top-down – starts with good
sensitivity but poor performance
• Easily adjustable
• Can be very sensitive
• Not as fast
IGrid
• Start from decimation in time (DIT)
• Bottom-up – starts with good performance
but poor sensitivity
• Less flexible
• Faster
The algorithm must be able to
• perform very long boxcar filters; for
SKA this is 8000+ samples
• Adjustable sensitivity
How to adjust sensitivity
… and increase performance:
• By decreasing/increasing distance
between boxcars L
• By performing more/less boxcars of
different widths W
• After some point it is pointless to
decrease L without more widths W
23. Single Pulse Search: BoxDIT
BoxDIT has two steps:
• Decimation in time - is used to control
sensitivity
• Ideal boxcar filter (Scan) – is
calculating boxcar filters.
BoxDIT is reusing previously (time)
decimated data to build longer boxcar
widths.
In GPU implementation both steps are
performed at once and kernel calculates
boxcar filters as well as decimation for
next iteration.
BOTTOM: Using combinations of data at
different decimation levels allows us to
construct longer width boxcars.
Diagram of the BoxDIT algorithm.
24. Single Pulse Search: BoxDIT Scan at every point
Algorithm for scan at every point (applying set of boxcar filters) first calculate small scan
at every point (here 4). The value of the longest boxcar (here 4) is stored into shared
memory.
Stored in registers
Stored into shared memory as well
25. Single Pulse Search: BoxDIT scanpart
Showing algorithm steps only for every 4th
thread. Other threads doing the same thing
for other points.
Each thread keeps values of boxcar filters in
registers. These are increased with every
step of the algorithm.
In each step i, an active thread A, calculates
a source thread id as Si=A-i*4. The value of
the longest boxcar calculated at beginning
is loaded from source thread (from shared
memory) and used to calculate longer
boxcars in the active thread. These are kept
by the active thread.
The highest SNR from the newly calculated
boxcars is then compared with the SNR of
the source thread and stored at it’s position
in shared memory if higher.
26. Single Pulse Search: BoxDIT performance
When calculating 32 boxcar per iteration
(1% signal loss, idealised case) code is
limited by compute with 63% device
memory bandwidth utilisation. It is 83x
faster then real-time.
When calculating 16 boxcars per
iteration (2% signal loss, idealised) code
is limited by device memory bandwidth
(86%). It is 170x faster then real time.
Other versions of BoxDIT algorithm
27. Single Pulse Search: Two algorithms
BoxDIT
• Starts from ideal Boxcar filter
• Top-down – starts with good
sensitivity but poor
performance
• Easily adjustable
• Not as fast
IGrid
• Start from decimation in time (DIT)
• Bottom-up – starts with good
performance but poor sensitivity
• Less flexible
• Faster
28. Single Pulse Search: IGrid
The IGrid algorithm is based on combination of different decimations in time, which are
shifted in time samples.
29. Single Pulse Search: IGRID
This algorithm could be interpreted
also as a binary tree where leaves
indicate a shift in number of time
samples.
The binary tree view suggest what
could be calculated locally (red area).
Therefore the only thing shared
through iterations is the time-series
with zero shift.
It also offers a way how to calculate
different boxcar widths, that is by going
to the root of the tree.
30. Single Pulse Search: IGRID
To calculate individual IGRID iterations
is inefficient and very demanding on
device memory bandwidth.
Thus we calculate multiple IGRID
iterations per thread-block.
Points represent layers that
must be calculated to get desired
sensitivity
must be calculated but it will be
recalculated by the next block
recalculated layer
Is calculated but it is not required
TOP: to calculate individual IGRID
iterations a lot of layers had to be shared
BOTTOM: Each thread-block calculates
multiple iterations.
31. Single Pulse Search: BoxDIT performance
IGrid 1 – Signal loss ~6%
IGrid 2 – Signal loss ~4.5%
IGrid 3 – Signal loss ~2%
32. Single Pulse Search: Results
Number of DM
trials/second for SKA-mid
sized data is about ~6000.
This means BOXDIT is
~83x-200x faster then real
time
IGRID is ~150x-580x
faster then real time.
Left: Comparison of
algorithms on average
signal loss and
performance (DM trials).
33. Conclusions
• Quantified source of sensitivity loss
• Adjustable sensitivity
• Two algorithms with different sensitivity/performance
ratio
• BoxDIT algorithm is in AstroAccelerate and used for
science output
35. Harmonic sum
When searching for pulsars
using Fourier domain
methods the power of the
pulsar is in frequency
domain spread to multiple
frequency bins. The
incoherent harmonic sum
algorithm is one way to
correct this.
TOP: time-series containing
pulsar (dots)
MIDDLE: frequency-domain
harmonics visible
BOTTOM: result of harmonic
sum for two diff. algorithms
36. Harmonic sum
The goal of the harmonic sum is to sum pulsar’s power that was spread into multiple
harmonics which are integer multiples of the fundamental frequency 𝑓0
h n 𝐻 =
𝑖=1
𝐻
𝑃(𝑖𝑓0) ,
Where H is number of harmonics summed and 𝑃 is the power spectrum.
But we do not know 𝑓0 and we work with discrete indices
fundamental = 10.33Hz first harmonic = 20.66Hz second harmonic = 31Hz
37. Harmonic sum
The defacto pulsar processing code, PRESTO, uses the following formula:
h n 𝐻 =
1
𝐻
𝑖=1
𝐻
𝑃
𝑛𝑖
𝐻
H is number of harmonics summed.
You can approach it from different end. Start with the index position of the fundamental
frequency n in frequency domain X[n] and add to it higher harmonics, that is X[2n], …
39. Harmonic sum – Problems
• Unfavorable access pattern
• Difficult data reuse
• Apply physical constrains
• h*(starting index)
• Stride in memory access is h
• Transpose of the data might help
• Simple data reuse is limited
• Cannot increase efficiency by increasing
number of fundamental freq. bins
• Complicated data reuse is resource and
bookkeeping heavy
• Decreases frequency range not index
range which we need to explore.
We aim to provide a selection of algorithm with different ratio of sensitivity/performance
40. Harmonic sum – Tree view
There are few possibilities how to do harmonic sum
Create all possible sums (exhaustive search) best
possible precision
41. Harmonic sum – Tree view
There are few possibilities how to do harmonic sum
Create all possible sums (exhaustive search) best
possible precision
Construct the sum from maxima of each harmonic.
That is ℎ 𝑛 𝐻 = σ𝑖=1
𝐻
max
1≤𝑗≤𝑖
𝑥 𝑖𝑛 + 𝑗
42. Harmonic sum – Tree view
There are few possibilities how to do harmonic sum
Create all possible sums (exhaustive search) best
possible precision
Construct the sum from maxima of each harmonic.
That is ℎ 𝑛 𝐻 = σ𝑖=1
𝐻
max
1≤𝑗≤𝑖
𝑥 𝑖𝑛 + 𝑗
Greedy algorithm which selects highest value
ℎ 𝑛 𝐻+1 = ℎ 𝑛 𝐻 + max 𝑥 𝐻𝑛 + 𝑗 , 𝑥(𝐻𝑛 + 𝑗 + 1)
43. Harmonic sum – Tree view
There are few possibilities how to do harmonic sum
Create all possible sums (exhaustive search) best
possible precision
Construct the sum from maxima of each harmonic.
That is ℎ 𝑛 𝐻 = σ𝑖=1
𝐻
max
1≤𝑗≤𝑖
𝑥 𝑖𝑛 + 𝑗
Greedy algorithm which selects highest value
ℎ 𝑛 𝐻+1 = ℎ 𝑛 𝐻 + max 𝑥 𝐻𝑛 + 𝑗 , 𝑥(𝐻𝑛 + 𝑗 + 1)
44. Harmonic sum – Tree view
There are few possibilities how to do harmonic sum
Create all possible sums (exhaustive search) best
possible precision
Construct the sum from maxima of each harmonic.
That is ℎ 𝑛 𝐻 = σ𝑖=1
𝐻
max
1≤𝑗≤𝑖
𝑥 𝑖𝑛 + 𝑗
Greedy algorithm which selects highest value
ℎ 𝑛 𝐻+1 = ℎ 𝑛 𝐻 + max 𝑥 𝐻𝑛 + 𝑗 , 𝑥(𝐻𝑛 + 𝑗 + 1)
Sum only integer multiples of the fundamental
ℎ 𝑛 𝐻 =
𝑖=1
𝐻
𝑥(𝑖𝑛)
45. Harmonic sum – Simple
• Simple harmonic sum
Data are transposed, which improve
data access
Device mem. bandwidth limited (77%)
No explicit data reuse caching is poor
(10% L2 hit rate)
Memory access pattern when threads process
neighboring fundamental frequency bins.
Memory access pattern when threads process
same fundamental frequency bin from different
data.
46. Harmonic sum – Greedy
• Greedy harmonic sum
For data reuse we really on caches
(52% L2 hit rate)
– kernel is waiting for data
Device memory Bandwidth utilization
(66%)
Memory access pattern when threads process
neighboring fundamental frequency bins.
Memory access pattern when threads process
same fundamental frequency bin from different
data.
47. Harmonic sum – Presto, MaxDIT
Presto harmonic sum
Limited by type conversion (array
indexing). However fp32 compute
is still high.
Max harmonic sum
Max harmonic sum is two step
1) Calculate all max decimations -
Limited by compute (Load/Store,
floating point operations)
2) Calculate partial sums and SNR
49. Harmonic sum – Results
As gold standard we have used our
implementation of presto‘s HRMS.
RIGHT: Sensitivity loss for pulsar
frequencies which are between
frequency bins. 50% decrease for
simple HRMS, only 30% for Greedy
and PRESTO.
LEFT: average sensitivity as it
depends on pulsar’s frequency.
50. Harmonic sum – Conclusions
• We have multiple algorithms with different parameters
• We need more sensitivity tests and tests of physical
correctness (artificial data, real data)
• We thinking about 2D harmonic sum for acceleration
searches
• We trying to increase performance