Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPUMahesh Khadatare
This poster presents Weather Research and Forecast (WRF)
model is a next-generation mesoscale numerical weather
prediction system designed to serve both operational forecasting
and atmospheric research communities. WRF offers multiple
physics options, one of which is the Long-Wave Rapid Radiative
Transfer Model (RRTM). Even with the advent of large-scale
parallelism in weather models, much of the performance increase
has came from increasing processor speed rather than increased
parallelism. We present an alternative method of scaling model
performance by exploiting emerging architectures like GPGPU
using the fine-grain parallelism. We claim to get much more than
23.71x, performance gain by using asynchronous data transfer,
use of texture memory and the techniques like loop unrolling.
A presentation looking at the parallelisation of the Vienna ab initio Simulation Package (VASP) code, how to optimise performance and scaling by tuning the input control tags, and some initial experience with the NVIDIA GPU (CUDA) port of the code, including compiling, running jobs, and some initial benchmark results.
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splinesIntel® Software
Orbital representations that are based on B-splines are widely used in quantum Monte Carlo (QMC) simulations of solids, which historically take as much as 50 percent of the total runtime. Random access to a large four-dimensional array make it challenging to efficiently use caches and wide vector units in modern CPUs. So, we present node-level optimizations of B-spline evaluations on multicore and manycore shared memory processors.
To increase single instruction multiple data (SIMD) efficiency and bandwidth utilization, we first apply data layout transformation from an array of structures (AoS) to a structure of arrays (SoA). Then, by blocking SoA objects, we optimize cache reuse and get sustained throughput for a range of problem sizes. We implement efficient nested threading in B-spline orbital evaluation kernels, paving the way towards enabling strong scaling of QMC simulations, resulting with performance enhancements. Finally, we employ roofline performance analysis to model the impacts of our optimizations.
Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPUMahesh Khadatare
This poster presents Weather Research and Forecast (WRF)
model is a next-generation mesoscale numerical weather
prediction system designed to serve both operational forecasting
and atmospheric research communities. WRF offers multiple
physics options, one of which is the Long-Wave Rapid Radiative
Transfer Model (RRTM). Even with the advent of large-scale
parallelism in weather models, much of the performance increase
has came from increasing processor speed rather than increased
parallelism. We present an alternative method of scaling model
performance by exploiting emerging architectures like GPGPU
using the fine-grain parallelism. We claim to get much more than
23.71x, performance gain by using asynchronous data transfer,
use of texture memory and the techniques like loop unrolling.
A presentation looking at the parallelisation of the Vienna ab initio Simulation Package (VASP) code, how to optimise performance and scaling by tuning the input control tags, and some initial experience with the NVIDIA GPU (CUDA) port of the code, including compiling, running jobs, and some initial benchmark results.
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splinesIntel® Software
Orbital representations that are based on B-splines are widely used in quantum Monte Carlo (QMC) simulations of solids, which historically take as much as 50 percent of the total runtime. Random access to a large four-dimensional array make it challenging to efficiently use caches and wide vector units in modern CPUs. So, we present node-level optimizations of B-spline evaluations on multicore and manycore shared memory processors.
To increase single instruction multiple data (SIMD) efficiency and bandwidth utilization, we first apply data layout transformation from an array of structures (AoS) to a structure of arrays (SoA). Then, by blocking SoA objects, we optimize cache reuse and get sustained throughput for a range of problem sizes. We implement efficient nested threading in B-spline orbital evaluation kernels, paving the way towards enabling strong scaling of QMC simulations, resulting with performance enhancements. Finally, we employ roofline performance analysis to model the impacts of our optimizations.
Multi-core GPU – Fast parallel SAR image generationMahesh Khadatare
This poster presents a design for parallel processing of synthetic
aperture radar (SAR) data using multi-core Graphics Processing
Units (GP-GPUs). In this design a two-dimensional image is
reconstructed from received echo pulses and their corresponding
response values. Then the comparative performance of proposed
parallel algorithm implementation is tabulated using different set of
nvidia GPUs i.e. GT, Tesla and Fermi. Analyzing experimentation
results on image of various size 1,024 x 1,024 pixels to 4,096 x
4,096 pixels led us to the conclusion that that nvidia Fermi S2050 have
maximum performance throughput .
Penn State RCC has been a CUDA research center for the last year; this talk provides success stories and challenges. GPU case studies are given, including algorithm details and performance results.
In this deck from the HPC User Forum at Argonne, Andrew Siegel from Argonne presents: ECP Application Development.
"The Exascale Computing Project is accelerating delivery of a capable exascale computing ecosystem for breakthroughs in scientific discovery, energy assurance, economic competitiveness, and national security. ECP is chartered with accelerating delivery of a capable exascale computing ecosystem to provide breakthrough modeling and simulation solutions to address the most critical challenges in scientific discovery, energy assurance, economic competitiveness, and national security. This role goes far beyond the limited scope of a physical computing system. ECP’s work encompasses the development of an entire exascale ecosystem: applications, system software, hardware technologies and architectures, along with critical workforce development."
Watch the video: https://wp.me/p3RLHQ-kSL
Learn more: https://www.exascaleproject.org
and
http://hpcuserforum.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
El Barcelona Supercomputing Center (BSC) fue establecido en 2005 y alberga el MareNostrum, uno de los superordenadores más potentes de España. Somos el centro pionero de la supercomputación en España. Nuestra especialidad es la computación de altas prestaciones - también conocida como HPC o High Performance Computing- y nuestra misión es doble: ofrecer infraestructuras y servicio de supercomputación a los científicos españoles y europeos, y generar conocimiento y tecnología para transferirlos a la sociedad. Somos Centro de Excelencia Severo Ochoa, miembros de primer nivel de la infraestructura de investigación europea PRACE (Partnership for Advanced Computing in Europe), y gestionamos la Red Española de Supercomputación (RES). Como centro de investigación, contamos con más de 456 expertos de 45 países, organizados en cuatro grandes áreas de investigación: Ciencias de la computación, Ciencias de la vida, Ciencias de la tierra y aplicaciones computacionales en ciencia e ingeniería.
We updated the DLA system introductions here, from design, add-on functions, and applications. During the 2018~2019, we developed the tools needed for IC simulation and verification, constructed a quantize-aware & HW-aware training flow, and improved the automation of the verification. We have verified this system through FPGA and solid-state SoC.
Designing HPC & Deep Learning Middleware for Exascale Systemsinside-BigData.com
DK Panda from Ohio State University presented this deck at the 2017 HPC Advisory Council Stanford Conference.
"This talk will focus on challenges in designing runtime environments for exascale systems with millions of processors and accelerators to support various programming models. We will focus on MPI, PGAS (OpenSHMEM, CAF, UPC and UPC++) and Hybrid MPI+PGAS programming models by taking into account support for multi-core, high-performance networks, accelerators (GPGPUs and Intel MIC), virtualization technologies (KVM, Docker, and Singularity), and energy-awareness. Features and sample performance numbers from the MVAPICH2 libraries will be presented."
Watch the video: http://wp.me/p3RLHQ-glW
Learn more: http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Frank Ham from Cascade Technologies presented this deck at the Stanford HPC Conference.
"A spin-off of the Center for Turbulence Research at Stanford University, Cascade Technologies grew out of a need to bridge between fundamental research from institutions like Stanford University and its application in industries. In a continual push to improve the operability and performance of combustion devices, high-fidelity simulation methods for turbulent combustion are emerging as critical elements in the design process. Multiphysics based methodologies can accurately predict mixing, study flame structure and stability, and even predict product and pollutant concentrations at design and off-design conditions."
Watch the video: http://insidehpc.com/2017/02/best-practices-large-scale-multiphysics/
Learn more: http://www.cascadetechnologies.com
and
http://www.hpcadvisorycouncil.com/events/2017/stanford-workshop/
Sign up for our insideHPC Newsletter: http:/insidehpc.com/newsletter
Parallel Implementation of K Means Clustering on CUDAprithan
K-Means clustering is a popular clustering algorithm in data mining. Clustering large data sets can be
time consuming, and in an attempt to minimize this time, our project is a parallel implementation of KMeans
clustering algorithm on CUDA using C. We present the performance analysis and implementation
of our approach to parallelizing K-Means clustering.
NERSC is the production high-performance computing (HPC) center for the United States Department of Energy (DOE) Office of Science. The center supports over 6,000 users in 600 projects, using a variety of applications in materials science, chemistry, biology, astrophysics, high energy physics, climate science, fusion science, and more.
NERSC deployed the Cori system on over 9,000 Intel® Xeon Phi™ processors. This session describes the optimization strategy for porting codes that target traditional manycore architectures to the processors. We also discuss highlights and lessons learned from the optimization process on 20 applications associated with the NERSC Exascale Science Application Program (NESAP).
byteLAKE's expertise across NVIDIA architectures and configurationsbyteLAKE
AI Solutions for Industries | Quality Inspection | Data Insights | AI-accelerated CFD | Self-Checkout | byteLAKE.com
byteLAKE: Empowering Industries with AI Solutions. Embrace cutting-edge technology for advanced quality inspection, data insights, and more. Harness the potential of our CFD Suite, accelerating Computational Fluid Dynamics for heightened productivity. Unlock new possibilities with Cognitive Services: image analytics for precise visual inspection for Manufacturing, sound analytics enabling proactive maintenance for Automotive, and wet line analytics for the Paper Industry. Seamlessly convert data into actionable insights using Data Insights' AI module, enabling advanced predictive maintenance and risk detection. Simplify Restaurant and Retail operations with our efficient self-checkout solution, recognizing meals and groceries and elevating customer satisfaction. Custom AI Development services available for tailored solutions. Discover more at www.byteLAKE.com.
Multi-core GPU – Fast parallel SAR image generationMahesh Khadatare
This poster presents a design for parallel processing of synthetic
aperture radar (SAR) data using multi-core Graphics Processing
Units (GP-GPUs). In this design a two-dimensional image is
reconstructed from received echo pulses and their corresponding
response values. Then the comparative performance of proposed
parallel algorithm implementation is tabulated using different set of
nvidia GPUs i.e. GT, Tesla and Fermi. Analyzing experimentation
results on image of various size 1,024 x 1,024 pixels to 4,096 x
4,096 pixels led us to the conclusion that that nvidia Fermi S2050 have
maximum performance throughput .
Penn State RCC has been a CUDA research center for the last year; this talk provides success stories and challenges. GPU case studies are given, including algorithm details and performance results.
In this deck from the HPC User Forum at Argonne, Andrew Siegel from Argonne presents: ECP Application Development.
"The Exascale Computing Project is accelerating delivery of a capable exascale computing ecosystem for breakthroughs in scientific discovery, energy assurance, economic competitiveness, and national security. ECP is chartered with accelerating delivery of a capable exascale computing ecosystem to provide breakthrough modeling and simulation solutions to address the most critical challenges in scientific discovery, energy assurance, economic competitiveness, and national security. This role goes far beyond the limited scope of a physical computing system. ECP’s work encompasses the development of an entire exascale ecosystem: applications, system software, hardware technologies and architectures, along with critical workforce development."
Watch the video: https://wp.me/p3RLHQ-kSL
Learn more: https://www.exascaleproject.org
and
http://hpcuserforum.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
El Barcelona Supercomputing Center (BSC) fue establecido en 2005 y alberga el MareNostrum, uno de los superordenadores más potentes de España. Somos el centro pionero de la supercomputación en España. Nuestra especialidad es la computación de altas prestaciones - también conocida como HPC o High Performance Computing- y nuestra misión es doble: ofrecer infraestructuras y servicio de supercomputación a los científicos españoles y europeos, y generar conocimiento y tecnología para transferirlos a la sociedad. Somos Centro de Excelencia Severo Ochoa, miembros de primer nivel de la infraestructura de investigación europea PRACE (Partnership for Advanced Computing in Europe), y gestionamos la Red Española de Supercomputación (RES). Como centro de investigación, contamos con más de 456 expertos de 45 países, organizados en cuatro grandes áreas de investigación: Ciencias de la computación, Ciencias de la vida, Ciencias de la tierra y aplicaciones computacionales en ciencia e ingeniería.
We updated the DLA system introductions here, from design, add-on functions, and applications. During the 2018~2019, we developed the tools needed for IC simulation and verification, constructed a quantize-aware & HW-aware training flow, and improved the automation of the verification. We have verified this system through FPGA and solid-state SoC.
Designing HPC & Deep Learning Middleware for Exascale Systemsinside-BigData.com
DK Panda from Ohio State University presented this deck at the 2017 HPC Advisory Council Stanford Conference.
"This talk will focus on challenges in designing runtime environments for exascale systems with millions of processors and accelerators to support various programming models. We will focus on MPI, PGAS (OpenSHMEM, CAF, UPC and UPC++) and Hybrid MPI+PGAS programming models by taking into account support for multi-core, high-performance networks, accelerators (GPGPUs and Intel MIC), virtualization technologies (KVM, Docker, and Singularity), and energy-awareness. Features and sample performance numbers from the MVAPICH2 libraries will be presented."
Watch the video: http://wp.me/p3RLHQ-glW
Learn more: http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Frank Ham from Cascade Technologies presented this deck at the Stanford HPC Conference.
"A spin-off of the Center for Turbulence Research at Stanford University, Cascade Technologies grew out of a need to bridge between fundamental research from institutions like Stanford University and its application in industries. In a continual push to improve the operability and performance of combustion devices, high-fidelity simulation methods for turbulent combustion are emerging as critical elements in the design process. Multiphysics based methodologies can accurately predict mixing, study flame structure and stability, and even predict product and pollutant concentrations at design and off-design conditions."
Watch the video: http://insidehpc.com/2017/02/best-practices-large-scale-multiphysics/
Learn more: http://www.cascadetechnologies.com
and
http://www.hpcadvisorycouncil.com/events/2017/stanford-workshop/
Sign up for our insideHPC Newsletter: http:/insidehpc.com/newsletter
Parallel Implementation of K Means Clustering on CUDAprithan
K-Means clustering is a popular clustering algorithm in data mining. Clustering large data sets can be
time consuming, and in an attempt to minimize this time, our project is a parallel implementation of KMeans
clustering algorithm on CUDA using C. We present the performance analysis and implementation
of our approach to parallelizing K-Means clustering.
NERSC is the production high-performance computing (HPC) center for the United States Department of Energy (DOE) Office of Science. The center supports over 6,000 users in 600 projects, using a variety of applications in materials science, chemistry, biology, astrophysics, high energy physics, climate science, fusion science, and more.
NERSC deployed the Cori system on over 9,000 Intel® Xeon Phi™ processors. This session describes the optimization strategy for porting codes that target traditional manycore architectures to the processors. We also discuss highlights and lessons learned from the optimization process on 20 applications associated with the NERSC Exascale Science Application Program (NESAP).
byteLAKE's expertise across NVIDIA architectures and configurationsbyteLAKE
AI Solutions for Industries | Quality Inspection | Data Insights | AI-accelerated CFD | Self-Checkout | byteLAKE.com
byteLAKE: Empowering Industries with AI Solutions. Embrace cutting-edge technology for advanced quality inspection, data insights, and more. Harness the potential of our CFD Suite, accelerating Computational Fluid Dynamics for heightened productivity. Unlock new possibilities with Cognitive Services: image analytics for precise visual inspection for Manufacturing, sound analytics enabling proactive maintenance for Automotive, and wet line analytics for the Paper Industry. Seamlessly convert data into actionable insights using Data Insights' AI module, enabling advanced predictive maintenance and risk detection. Simplify Restaurant and Retail operations with our efficient self-checkout solution, recognizing meals and groceries and elevating customer satisfaction. Custom AI Development services available for tailored solutions. Discover more at www.byteLAKE.com.
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...inside-BigData.com
In this deck from PASC18, Robert Searles from the University of Delaware presents: Abstractions and Directives for Adapting Wavefront Algorithms to Future Architectures.
"Architectures are rapidly evolving, and exascale machines are expected to offer billion-way concurrency. We need to rethink algorithms, languages and programming models among other components in order to migrate large scale applications and explore parallelism on these machines. Although directive-based programming models allow programmers to worry less about programming and more about science, expressing complex parallel patterns in these models can be a daunting task especially when the goal is to match the performance that the hardware platforms can offer. One such pattern is wavefront. This paper extensively studies a wavefront-based miniapplication for Denovo, a production code for nuclear reactor modeling.
We parallelize the Koch-Baker-Alcouffe (KBA) parallel-wavefront sweep algorithm in the main kernel of Minisweep (the miniapplication) using CUDA, OpenMP and OpenACC. Our OpenACC implementation running on NVIDIA's next-generation Volta GPU boasts an 85.06x speedup over serial code, which is larger than CUDA's 83.72x speedup over the same serial implementation. Our experimental platform includes SummitDev, an ORNL representative architecture of the upcoming Summit supercomputer. Our parallelization effort across platforms also motivated us to define an abstract parallelism model that is architecture independent, with a goal of creating software abstractions that can be used by applications employing the wavefront sweep motif."
Watch the video: https://wp.me/p3RLHQ-iPU
Read the Full Paper: https://doi.org/10.1145/3218176.3218228
and
https://pasc18.pasc-conference.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Achitecture Aware Algorithms and Software for Peta and Exascaleinside-BigData.com
Jack Dongarra from the University of Tennessee presented these slides at Ken Kennedy Institute of Information Technology on Feb 13, 2014.
Listen to the podcast review of this talk: http://insidehpc.com/2014/02/13/week-hpc-jack-dongarra-talks-algorithms-exascale/
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)byteLAKE
See our presentation from the 6th International EULAG Users Workshop. We talked about taking HPC to the "Industry 4.0" by implementing smart techniques to optimize the codes in terms of performance and energy consumption. It explains how Machine Learning can dynamically optimize HPC simulations and byteLAKE's software autotuning solution.
Find out more about byteLAKE at: www.byteLAKE.com
Assisting User’s Transition to Titan’s Accelerated Architectureinside-BigData.com
Oak Ridge National Lab is home of Titan, the largest GPU accelerated supercomputer in the world. This fact alone can be an intimidating experience for users new to leadership computing facilities. Our facility has collected over four years of experience helping users port applications to Titan. This talk will explain common paths and tools to successfully port applications, and expose common difficulties experienced by new users. Lastly, learn how our free and open training program can assist your organization in this transition.
Artificial Neural Networks for Storm Surge Prediction in North CarolinaAnton Bezuglov
Feedforward Artificial Neural network (FF ANN) for storm surge prediction in North Carolina. Presentation at Coastal Resilience Center by Anton Bezuglov, Ph.D. Usage of TensorFlow and Python with links to the code on GitHub.
In this video from the 2015 Stanford HPC Conference, Pavel Shamis from ORNL presents: Preparing OpenSHMEM for Exascale.
"OpenSHMEM is a partitioned global address space (PGAS) one-sided communications library that enables remote memory access (RMA) across processing elements (PEs). Its API allows data to be transferred from one PE memory space to another PE’s symmetric memory space; decoupling the data transfers from synchronizations. OpenSHMEM is useful for applications that are latency driven or that have irregular communication patterns, because its one-sided API can be mapped very efficiently to hardware (e.g. RDMA interconnects, etc), and its one-sided programming model helps the overlapping of communication with computation. Summit is Oak Ridge National Laboratory’s next high performance supercomputer system that will be based on a many core/GPU hybrid architecture. In order to prepare OpenSHMEM for future systems, it is important to enhance its programming model to enable efficient utilization of the new hardware capabilities (e.g. massive multithreaded systems, accesses different type memories, next generation of interconnects, etc). This session will present recent advances in the area of OpenSHMEM extensions, implementations, and tools.”
Watch the video: http://insidehpc.com/2015/02/video-preparing-openshmem-for-exascale/
See more talks in the Stanford HPC Conference Video Gallery: http://wp.me/P3RLHQ-dOO
Talk @ APT Group, University of Manchester, 06 August 2014
Abstract:
Nowadays HPC systems, such as those in the Top500, are equipped with a range of different processors, from multi-core CPUs to GPUs. Programming them can be a tough job, specially if we want to squeeze every last FLOPs of performance out of them.
As a Phd Student, I am now doing a brief research visit in the APT group, working in topics related to the programmability and efficient use of GPUs and many-core coprocessors. In particular, I am implementing a large database operation using OpenCL in these state-of-the-art systems. In this talk I will summarize my work in Manchester and discuss the future work in this topic.
Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...Databricks
In this talk, we evaluate training of deep recurrent neural networks with half-precision floats on Pascal and Volta GPUs. We implement a distributed, data-parallel, synchronous training algorithm by integrating TensorFlow and CUDA-aware MPI to enable execution across multiple GPU nodes and making use of high-speed interconnects. We introduce a learning rate schedule facilitating neural network convergence at up to O(100) workers.
Strong scaling tests performed on GPU clusters show linear runtime scaling and logarithmic communication time scaling for both single and mixed precision training modes. Performance is evaluated on a scientific dataset taken from the Joint European Torus (JET) tokamak, containing multi-modal time series of sensory measurements leading up to deleterious events called plasma disruptions. Half-precision significantly reduces memory and network bandwidth, allowing training of state-of-the-art models with over 70 million trainable parameters while achieving a comparable test set performance as single precision.
The Libre-SOC Project aims to create an entirely Libre-Licensed, transparently-developed fully auditable Hybrid 3D CPU-GPU-VPU, using the Supercomputer-class OpenPOWER ISA as the foundation.
Our first test ASIC is a 180nm "Fixed-Point" Power ISA v3.0B processor, 5.1mm x 5.9mm, as a proof-of-concept for the team, whose primary expertise is in Software Engineering. Software Engineering training brings a radically different approach to Hardware development: extensive unit tests, source code revision control, automated development tools are normal. Libre Project Management brings even more: bug trackers, mailing lists, auditable IRC logs and a wiki are standard fare for Libre Projects that are simply not normal Industry-Standard practice.
This talk therefore goes through the workflow, from the original HDL through to the GDS-II layout, showing how we were able to keep track of the development that led to the IMEC 180nm tape-out in July 2021. In particular, by following a parallel development process involving "Real" and "Symbolic" Cell Libraries, developed by Chips4Makers, will be shown how our developers did not need to sign a Foundry NDA, but were still able to work side-by-side with a University that did. With this parallel development process, the University upheld their NDA obligations, and Libre-SOC were simultaneously able to honour its Transparency Objectives.
Workload Transformation and Innovations in POWER Architecture Ganesan Narayanasamy
IT Industry is going through two major transformations. One is adaption of AI and tight integration of the same in the commercial applications and enterprise workflow. Two the transformation in software architecture through the concepts like microservices and the cloud native architecture. These transformation alongside the aggressive adaption of IoT/mobile and 5G in all our day today activities is making the world operate in more real time manner which opens-up a new challenge to improve the hardware architecture to adapt to these requirements. These above two major transformation pushes the boundary of the entire systems stack making the designer rethink hardware. This talk presents you a picture of how the enterprise Industry leading POWER architecture is transforming to fulfill the performance demands of these newer generation workloads with primary focus on the AI acceleration on the chip.
July 16th 2021 , Friday for our newest workshop with DoMS, IIT Roorkee, Concept to Solutions using OpenPOWER Stack. It's time to discover advances in #DeepLearning tools and techniques from the world's leading innovators across industries, research, and public speakers.
Register here:
https://lnkd.in/ggxMq2N
This presentation covers two uses cases using OpenPOWER Systems
1. Diabetic Retinopathy using AI on NVIDIA Jetson Nano: The objective is to classify the diabetic level solely on retina image in a remote area with minimum doctor's inference. The model uses VGG16 network architecture and gets trained from scratch on POWER9. The model was deployed on the Jetson Nano board.
1. Classifying Covid positivity using lung X-ray images: The idea is to build ML models to detect positive cases using X-ray images. The model was trained on POWER9, and the application was developed using Python.
IBM Bayesian Optimization Accelerator (BOA) is a do-it-yourself toolkit to apply state-of-the-art Bayesian inferencing techniques and obtain optimal solutions for complex, real-world design simulations without requiring deep machine learning skills. This talk will describe IBM BOA, its differentiation and ease of use, and how researchers can take advantage of it for optimizing any arbitrary HPC simulation.
This presentation covers various partners and collaborators who are currently working with OpenPOWER foundation ,Use cases of OpenPOWER systems in multiple Industries , OpenPOWER Workgroups and OpenCAPI features .
The IBM POWER10 processor represents the 10th generation of the POWER family of enterprise computing engines. Its performance is a result of both powerful processing cores and high-bandwidth intra- and inter-chip interconnect. POWER10 systems can be configured with up to 16 processor chips and 1920 simultaneous threads of execution. Cross-system memory sharing, through the new Memory Inception technology, and 2 Petabytes of addressing space support an expansive memory system. The POWER10 processing core has been significantly enhanced over its POWER9 predecessor, including a doubling of vector units and the addition of an all-new matrix math engine. Throughput gains from POWER9 to POWER10 average 30% at the core level and three-fold at the socket level. Those gains can reach ten- or twenty-fold at the socket level for matrix-intensive computations.
Everything is changing from Health Care to the Automotive markets without forgetting Financial markets or any type of engineering everything has stopped being created as an individual or best-case scenario a team effort to something that is being developed and perfectioned by using AI and hundreds of computers.And even AI is something that we no longer can run in a single computer, no matter how powerful it is. What drives everything today is HPC or High-Performance Computing heavily linked to AI In this session we will discuss about AI, HPC computing, IBM Power architecture and how it can help develop better Healthcare, better Automobiles, better financials and better everything that we run on them
Macromolecular crystallography is an experimental technique allowing to explore 3D atomic structure of proteins, used by academics for research in biology and by pharmaceutical companies in rational drug design. While up to now development of the technique was limited by scientific instruments performance, recently computing performance becomes a key limitation. In my presentation I will present a computing challenge to handle 18 GB/s data stream coming from the new X-ray detector. I will show PSI experiences in applying conventional hardware for the task and why this attempt failed. I will then present how IC 922 server with OpenCAPI enabled FPGA boards allowed to build a sustainable and scalable solution for high speed data acquisition. Finally, I will give a perspective, how the advancement in hardware development will enable better science by users of the Swiss Light Source.
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsGanesan Narayanasamy
As the adoption of AI technologies increases and matures, the focus will shift from exploration to time to market, productivity and integration with existing workflows. Governing Enterprise data, scaling AI model development, selecting a complete, collaborative hybrid platform and tools for rapid solution deployments are key focus areas for growing data scientist teams tasked to respond to business challenges. This talk will cover the challenges and innovations for AI at scale for the Industires such as Healthcare and Automotive , the AI ladder and AI life cycle and infrastructure architecture considerations.
This talk gives an introduction about Healthcare Use cases - The AI ladder and Lifestyle AI at Scale Themes The iterative nature of the workflow and some of the important components to be aware in developing AI health care solutions were being discussed. The different types of algorithms and when machine learning might be more appropriate in deep learning or the other way will also be discussed. Use cases in terms of examples are also shared as part of this presentation .
Healthcare has became one of the most important aspects of everyones life. Its importance has surged due to the latests outbreaks and due to this latest pandemic it has become mandatory to collaborate to improve everyones Healthcare as soon as possible.
IBM has reacted quickly sharing not only its knowledge but also its Artificial Intelligence Supercomputers all around the world.
Those Supercomputers are helping to prevail this outbreak and also future ones.
They have completely different features compared to proposals from other players of this Supercomputers market.
We will try to make a quick look at the differences of those AI focused Supercomputers and how they can help in the R&D of Healthcare solutions for everyone, from those ones with access to a big IBM AI Supercomputer to those ones with access to only one small IBM AI focused server.
Healthcare has became one of the most important aspects of everyones life. Its importance has surged due to the latests outbreaks and due to this latest pandemic it has become mandatory to collaborate to improve everyones Healthcare as soon as possible.
IBM has reacted quickly sharing not only its knowledge but also its Artificial Intelligence Supercomputers all around the world.
Those Supercomputers are helping to prevail this outbreak and also future ones.
They have completely different features compared to proposals from other players of this Supercomputers market.
We will try to make a quick look at the differences of those AI focused Supercomputers and how they can help in the R&D of Healthcare solutions for everyone, from those ones with access to a big IBM AI Supercomputer to those ones with access to only one small IBM AI focused server.
Moving object recognition (MOR) corresponds to the localization and classification of moving objects in videos. Discriminating moving objects from static objects and background in videos is an essential task for many computer vision applications. MOR has widespread applications in intelligent visual surveillance, intrusion detection, anomaly detection and monitoring, industrial sites monitoring, detection-based tracking, autonomous vehicles, etc. In this session, Murari provided a poster about the deep learning algorithms to identify both locations and corresponding categories of moving objects with a convolutional network. The challenges in developing such algorithms have been discussed.
Clarisse Hedglin from IBM presented this as part of 3 days International Summit .. She shared the scenarios AI can solve for today using the IBM AI infrastructure.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
UiPath Test Automation using UiPath Test Suite series, part 5
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and Effective Fortran Experience
1. ORNL is managed by UT-Battelle
for the US Department of Energy
Targeting GPUs using OpenMP
Directives on Summit with
GenASiS: A Simple and Effective
Fortran Experience
Reuben D. Budiardja
Scientific Computing Group,
Oak Ridge Leadership Computing Facility,
Oak Ridge National Laboratory
Christian Cardall
Physics Division,
Oak Ridge National Laboratory
This research used resources of the Oak Ridge Leadership Computing Facility at the Oak
Ridge National Laboratory, which is supported by the Office of Science of the U.S.
Department of Energy under Contract No. DE-AC05-00OR22725.
2. The Application
General Astrophysics Simulation System (GenASiS)
• Designed for parallel, large scale simulations
– weak-scale to ~100 thousands MPI processes
• Written entirely in modern Fortran (2003, 2008)
• Modular, object-oriented design, and extensible
• Multi-physics solvers:
– (Magneto)-hydrodynamics (HLL, HLLC solvers)
– Explicit 2nd order time-integration
– Self-gravity, polytropic & nuclear EoS
– Grey and spectral neutrino transport
• CPU only code with OpenMP for threading (prior to this work)
3. The Application
• Studied the role fluid instabilities ---
convection and Standing Accretion Shock
Instability (SASI) --- in supernova dynamics
• Discovered exponential magnetic field
amplification by SASI in progenitor star
→ origin of neutron star magnetic fields
• Refactored to three major subdivisions:
Basics, Mathematics, Physics → allowing
unit testing, ad-hoc/standalone tests,
mini-apps
4. Paths to Targeting GPU
• CUDA
– requires rewrite of all computational kernels
– loss of Fortran semantics (multi-d arrays, pointer/array remapping)
– requires interfacing with the rest of the (Fortran) code
• CUDA Fortran
– non standard extension to Fortran (XL, PGI)
– cannot easily fall back to standard Fortran
• Directives (OpenMP)
– retain Fortran semantics
– OpenMP 4.5 has excellent support from IBM XL (Summit), CCE (Titan)
• with excellent support for modern Fortran
5. Lower-Level GenASiS Functionality
• Fortran wrappers to OpenMP APIs
– call AllocateDevice(Value, D_Value)
→ omp_target_alloc()
call AssociateHost(D_Value, Value)
→ omp_target_associate_ptr()
call UpdateDevice(Value, D_Value),
call UpdateHost(Value, D_Value)
→ omp_target_memcpy()
Value : Fortran array
D_Value : type(c_ptr), GPU
address
● Affirmative control of data movement
● Persistent memory allocation on the device
6. Higher-level GenASiS Functionality
• StorageForm :
– a class for data and metadata; the ‘heart’ of data storage facility in GenASiS
– metadata includes units, variable names (for I/O, visualization)
– used to group together a set of related physical variables (e.g. Fluid)
– render more generic and simplified code for I/O, ghost exchange,
prolongation & restriction (AMR mesh)
• Data: StorageForm % Value ( nCells, nVariables )
• Methods:
– call StorageForm % Initialize ( ) ← allocate data on host
– call StorageForm % AllocateDevice ( ) ← allocate data on GPU
– call StorageForm % Update{Device,Host} ( ) ← transfer data
7. Sidebar: OpenMP Memory for Offload
• OpenMP maps host (CPU) variables to device (GPU) (explicitly or
implicitly)
– default copy to-from
• Presence check: if fail, new variable is created on device
– sometime requires explicit association to avoid unintentional data
movement
8. Offloading Computational Kernel
call F % Initialize &
([nCells, nVariables])
call F % AllocateDevice ( )
call F % UpdateDevice ( )
call AddKernel &
( F % Value ( :, 1 ),
F % Value ( :, 2 ), &
F % D_Value ( 1 ),
F % D_Value ( 2 ), &
F % D_Value ( 3 ),
F % Value ( :, 3 ) )
9. Example of Kernel with Pointer Remapping
real ( KDR ), dimension ( :, :, : ), pointer
:: V, dV
V ( −1:nX+2, −1:nY+2, −1:nZ+2 ) => F % Value ( : , iV )
dV ( −1:nX+2 , −1:nY+2 , −1:nZ+2 ) => dF % Value ( : , iV )
call ComputeDifferences_X ( V, F % D_Value ( iV), ... )
19. Beyond OpenMP: Using Pinned Memory
To optimize data transfers:
• pinned memory: page-locked host memory allocated using
cudaMallocHost() or cudaHostAlloc()
• created another Fortran wrapper in GenASiS, used in StorageForm
initialization method as an option
StorageForm % Initialize ( …, PinnedOption = .true.)
• No mechanism to do this with OpenMP 4.5 (but perhaps in 5.x)
20. Performance Results: Using Pinned Memory
Speedups of 1.7 - 2.0X when
pinned memory is used →
overall speedups of over 9X
from 7 CPU threads
21. Implications (Then and Now)
• c. 2010: 10243
cells with 64000 processors (Jaguar), ~3s per
timestep
Now: 64 GPUs (11 nodes) on Summit, ~1.2s per timestep
• Enable us to do higher-fidelity simulations, ensemble studies for
trends in observables
– plan to perform ~200 2D grey transport supernova simulations, tens of 3D
grey transport, and a handful of 3D spectral transport simulations
• First step towards full Boltzmann radiation transport (6-D problem +
time) with exascale computing
22. Remaining Issues and Future Work
• A single code-base with OpenMP for multi-threading and offload
– Fall back to multi-threading with target if-clause is problematic
– team distribute directive introduce deleterious effects for multi-threading
• Kernel-launch parallelism and CUDA streams
– no mechanism within OpenMP to affect and select stream
• Better compilers support for OpenMP 4.5 - 5.x
• Using CUDA-aware MPI for GPU Direct
– benefit (vs. manual staging on host) depends on message sizes
23. Conclusion
• Using OpenMP allows for a simple and effective porting of Fortran
code to target GPU
– 2 - 3 months efforts for this project