Parallel Implementation of K Means Clustering on CUDAprithan
K-Means clustering is a popular clustering algorithm in data mining. Clustering large data sets can be
time consuming, and in an attempt to minimize this time, our project is a parallel implementation of KMeans
clustering algorithm on CUDA using C. We present the performance analysis and implementation
of our approach to parallelizing K-Means clustering.
Reading: "Pi in the sky: Calculating a record-breaking 31.4 trillion digits o...Kento Aoyama
(Journal Club at AIS Lab. on April 22, 2019)
Reading: “Pi in the sky: Calculating a record-breaking 31.4 trillion digits of Archimedes’ constant on Google Cloud”
Dr. Kashif Rasul from Zalando Research in Berlin held this presentation on "Multi-GPU for Deep Learning" on the COMPUTER SCIENCE, MACHINE LEARNING & STATISTICS MEETUP in the Zalando adtech lab Office in Hamburg on 6th September 2017
Parallel Implementation of K Means Clustering on CUDAprithan
K-Means clustering is a popular clustering algorithm in data mining. Clustering large data sets can be
time consuming, and in an attempt to minimize this time, our project is a parallel implementation of KMeans
clustering algorithm on CUDA using C. We present the performance analysis and implementation
of our approach to parallelizing K-Means clustering.
Reading: "Pi in the sky: Calculating a record-breaking 31.4 trillion digits o...Kento Aoyama
(Journal Club at AIS Lab. on April 22, 2019)
Reading: “Pi in the sky: Calculating a record-breaking 31.4 trillion digits of Archimedes’ constant on Google Cloud”
Dr. Kashif Rasul from Zalando Research in Berlin held this presentation on "Multi-GPU for Deep Learning" on the COMPUTER SCIENCE, MACHINE LEARNING & STATISTICS MEETUP in the Zalando adtech lab Office in Hamburg on 6th September 2017
The next generation of the Montage image mosaic engineG. Bruce Berriman
Presentation given by Bruce Berriman at the Astronomical Data Analysis Software & Systems XXV (ADASS XXV) Conference, Sydney, Australia, October 29, 2015.
Authors: G. B. Berriman, J.C. Good, B. Rusholme, T. Robitaille.
Faster Practical Block Compression for Rank/Select DictionariesRakuten Group, Inc.
We present faster practical encoding and decoding procedures for block compression. Such encoding and decoding procedures are important to efficiently support rank/select queries on compressed bit vectors. This paper was presented at the 24th International Symposium on String Processing and Information Retrieval (SPIRE 2017) in Palermo, Italy.
The .NET garbage collector can be your best friend or your worst enemy; and it’s not friendly with a lot of people. The GC left more than a few production systems burning in smoke after developers failed to anticipate the effects of real production loads on the memory subsystem. In this talk, we will methodically measure and improve the .NET garbage collector’s performance. We will begin with a quick refresher on dynamic performance tools that can identify GC issues: CLR performance counters, ETW GC events, and ETW object allocation events; as well as static analysis tools, such as the Roslyn-based heap allocations analyzer. Then, we will inspect multiple issues at the source code level: excessive boxing, unintended effects of lambdas closing over local variables, await-generated state machines, intermediate objects in LINQ queries, and many others. We will also discuss higher-level memory problems: how to get rid of large object allocations, how to avoid finalization, and how to convert heap-based designs to local objects. Some of these ideas are now being applied at the language and framework level in C# 7 and .NET Core. At the end of the talk, you will be equipped to reduce memory traffic and GC overhead in your own applications, often by a factor of 10 or more!
SX Aurora TSUBASA (Vector Engine) a Brand-new Vector Supercomputing power in...inside-BigData.com
In this deck from the HPC User Forum at Argonne, Deepak Pathania presents: SX Aurora TSUBASA (Vector Engine) a Brand-new Vector Supercomputing power in Server Chassis.
"The NEC Vector Engine Processor was developed using 16 nm FinFET process technology for extreme high performance and low power consumption. The Vector Engine Processor has the world's first implementation of one processor with six HBM2 memory modules using Chip-on-Wafer-on-Substrate technology, leading to the world-record memory bandwidth of 1.2 TB/s."
Watch the video: https://wp.me/p3RLHQ-kOK
Learn more: https://www.nec.com/en/global/solutio...
and
http://hpcuserforum.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
I review how to derive Newtons law of universal gravitation from the Weyl strut between two Chazy-Curzon particles. I also briefly review Causal Dynamical Triangulations (CDT), a method for evaluating the path integral from canonical quantum gravity using Regge calculus and restrictions of the class of simplicial manifolds evaluated to those with a defined time foliation, thus enforcing a causal structure. I then discuss how to apply this approach to Causal Dynamical Triangulations, in particular modifying the algorithm to keep two simplicial submanifolds with curvature (i.e. mass) a fixed distance from each other, modulo regularized deviations and across all time slices. I then discuss how to determine if CDT produces an equivalent Weyl strut, which can then be used to obtain the Newtonian limit. I wrap up with a brief discussion of computational methods and code development.
In this paper we propose Regularised Cross-Modal Hashing
(RCMH) a new cross-modal hashing model that projects
annotation and visual feature descriptors into a common
Hamming space. RCMH optimises the hashcode similarity
of related data-points in the annotation modality using an
iterative three-step hashing algorithm: in the first step each
training image is assigned a K-bit hashcode based on hyperplanes learnt at the previous iteration; in the second step the binary bits are smoothed by a formulation of graph regularisation so that similar data-points have similar bits; in the third step a set of binary classifiers are trained to predict the regularised bits with maximum margin. Visual descriptors are projected into the annotation Hamming space by a set of binary classifiers learnt using the bits of the corresponding annotations as labels. RCMH is shown to consistently improve retrieval effectiveness over state-of-the-art baselines.
Introduction to GPUs for Machine LearningSri Ambati
Graphics processing units (GPUs) are becoming integral components of modern machine learning engines and platforms. These will provide an introduction to GPUs and their suitability for machine learning workloads. They also discuss enabling technologies, such as CUDA, and demonstrate GPU-accelerated machine learning with the H2O platform. These slides are targeted to machine learning practitioners new to GPUs.
Author: Wen Phan is a Senior Solutions Architect at H2O.ai. Wen works with customers and organizations to architect systems, smarter applications, and data products to make better decisions, achieve positive outcomes, and transform the way they do business. Internally, Wen uses his hard-earned field experiences, customer feedback, and market trends to drive product innovation and development. Wen holds a B.S. in Electrical Engineering and M.S. in Analytics and Decision Sciences.
Follow him on twitter: @wenphan
The next generation of the Montage image mosaic engineG. Bruce Berriman
Presentation given by Bruce Berriman at the Astronomical Data Analysis Software & Systems XXV (ADASS XXV) Conference, Sydney, Australia, October 29, 2015.
Authors: G. B. Berriman, J.C. Good, B. Rusholme, T. Robitaille.
Faster Practical Block Compression for Rank/Select DictionariesRakuten Group, Inc.
We present faster practical encoding and decoding procedures for block compression. Such encoding and decoding procedures are important to efficiently support rank/select queries on compressed bit vectors. This paper was presented at the 24th International Symposium on String Processing and Information Retrieval (SPIRE 2017) in Palermo, Italy.
The .NET garbage collector can be your best friend or your worst enemy; and it’s not friendly with a lot of people. The GC left more than a few production systems burning in smoke after developers failed to anticipate the effects of real production loads on the memory subsystem. In this talk, we will methodically measure and improve the .NET garbage collector’s performance. We will begin with a quick refresher on dynamic performance tools that can identify GC issues: CLR performance counters, ETW GC events, and ETW object allocation events; as well as static analysis tools, such as the Roslyn-based heap allocations analyzer. Then, we will inspect multiple issues at the source code level: excessive boxing, unintended effects of lambdas closing over local variables, await-generated state machines, intermediate objects in LINQ queries, and many others. We will also discuss higher-level memory problems: how to get rid of large object allocations, how to avoid finalization, and how to convert heap-based designs to local objects. Some of these ideas are now being applied at the language and framework level in C# 7 and .NET Core. At the end of the talk, you will be equipped to reduce memory traffic and GC overhead in your own applications, often by a factor of 10 or more!
SX Aurora TSUBASA (Vector Engine) a Brand-new Vector Supercomputing power in...inside-BigData.com
In this deck from the HPC User Forum at Argonne, Deepak Pathania presents: SX Aurora TSUBASA (Vector Engine) a Brand-new Vector Supercomputing power in Server Chassis.
"The NEC Vector Engine Processor was developed using 16 nm FinFET process technology for extreme high performance and low power consumption. The Vector Engine Processor has the world's first implementation of one processor with six HBM2 memory modules using Chip-on-Wafer-on-Substrate technology, leading to the world-record memory bandwidth of 1.2 TB/s."
Watch the video: https://wp.me/p3RLHQ-kOK
Learn more: https://www.nec.com/en/global/solutio...
and
http://hpcuserforum.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
I review how to derive Newtons law of universal gravitation from the Weyl strut between two Chazy-Curzon particles. I also briefly review Causal Dynamical Triangulations (CDT), a method for evaluating the path integral from canonical quantum gravity using Regge calculus and restrictions of the class of simplicial manifolds evaluated to those with a defined time foliation, thus enforcing a causal structure. I then discuss how to apply this approach to Causal Dynamical Triangulations, in particular modifying the algorithm to keep two simplicial submanifolds with curvature (i.e. mass) a fixed distance from each other, modulo regularized deviations and across all time slices. I then discuss how to determine if CDT produces an equivalent Weyl strut, which can then be used to obtain the Newtonian limit. I wrap up with a brief discussion of computational methods and code development.
In this paper we propose Regularised Cross-Modal Hashing
(RCMH) a new cross-modal hashing model that projects
annotation and visual feature descriptors into a common
Hamming space. RCMH optimises the hashcode similarity
of related data-points in the annotation modality using an
iterative three-step hashing algorithm: in the first step each
training image is assigned a K-bit hashcode based on hyperplanes learnt at the previous iteration; in the second step the binary bits are smoothed by a formulation of graph regularisation so that similar data-points have similar bits; in the third step a set of binary classifiers are trained to predict the regularised bits with maximum margin. Visual descriptors are projected into the annotation Hamming space by a set of binary classifiers learnt using the bits of the corresponding annotations as labels. RCMH is shown to consistently improve retrieval effectiveness over state-of-the-art baselines.
Introduction to GPUs for Machine LearningSri Ambati
Graphics processing units (GPUs) are becoming integral components of modern machine learning engines and platforms. These will provide an introduction to GPUs and their suitability for machine learning workloads. They also discuss enabling technologies, such as CUDA, and demonstrate GPU-accelerated machine learning with the H2O platform. These slides are targeted to machine learning practitioners new to GPUs.
Author: Wen Phan is a Senior Solutions Architect at H2O.ai. Wen works with customers and organizations to architect systems, smarter applications, and data products to make better decisions, achieve positive outcomes, and transform the way they do business. Internally, Wen uses his hard-earned field experiences, customer feedback, and market trends to drive product innovation and development. Wen holds a B.S. in Electrical Engineering and M.S. in Analytics and Decision Sciences.
Follow him on twitter: @wenphan
Accelerating HPC Applications on NVIDIA GPUs with OpenACCinside-BigData.com
In this deck from the Stanford HPC Conference, Doug Miles from NVIDIA presents: Accelerating HPC Applications on NVIDIA GPUs with OpenACC."
"OpenACC is a directive-based parallel programming model for GPU accelerated and heterogeneous parallel HPC systems. It offers higher programmer productivity compared to use of explicit models like CUDA and OpenCL.
Application source code instrumented with OpenACC directives remains portable to any system with a standard Fortran/C/C++ compiler, and can be efficiently parallelized for various types of HPC systems – multicore CPUs, heterogeneous CPU+GPU, and manycore processors.
This talk will include an introduction to the OpenACC programming model, provide examples of its use in a number of production applications, explain how OpenACC and CUDA Unified Memory working together can dramatically simplify GPU programming, and close with a few thoughts on OpenACC future directions."
Watch the video: https://youtu.be/CaE3n89QM8o
Learn more: https://www.openacc.org/
and
http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
dCUDA: Distributed GPU Computing with Hardware Overlapinside-BigData.com
Torsten Hoefler from ETH Zurich presented this deck at the Switzerland HPC Conference.
"Over the last decade, CUDA and the underlying GPU hardware architecture have continuously gained popularity in various high-performance computing application domains such as climate modeling, computational chemistry, or machine learning. Despite this popularity, we lack a single coherent programming model for GPU clusters. We therefore introduce the dCUDA programming model, which implements device-side remote memory access with target notification. To hide instruction pipeline latencies, CUDA programs over-decompose the problem and over-subscribe the device by running many more threads than there are hardware execution units. Whenever a thread stalls, the hardware scheduler immediately proceeds with the execution of another thread ready for execution. This latency-hiding technique is key to make best use of the available hardware resources. With dCUDA, we apply latency hiding at cluster scale to automatically overlap computation and communication. Our benchmarks demonstrate perfect overlap for memory bandwidth-bound tasks and good overlap for compute-bound tasks."
Watch the video presentation: http://wp.me/p3RLHQ-gCB
A general introduction to GPGPU and an application involving solving large preconditioning problems with Domain Decomposition. Code is available at http://sourceforge.net/projects/cudasolver/ .
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Datasets with millions of events in charm decays at LHCb have prompted the development of powerful fitting and analysis tools capable of handling unbinned datasets using GPUs and multithreaded architectures.
GooFit, the original GPU fitting program with a familiar syntax resembling classic RooFit, has undergone significant redesign and has expanded physics and computing capabilities. The performance has been improved and tested on a variety of systems. GooFit 2.0 is easier than ever to install, develop, and use on any system.
A new templated header-only library, Hydra, provides highly optimized general framework for fits, Monte Carlo generation, integration, and more. The design and benefits of this system along with initial tests will be shown.
Finally, a model-independent search for direct CP violation using an unbinned approach called an energy test was performed directly using the Thrust library (which both of the previous packages are based on). Public results from this analysis and performance comparisons will be presented.
In this deck from ATPESC 2019, Jack Dongarra from UT Knoxville presents: Adaptive Linear Solvers and Eigensolvers.
"Success in large-scale scientific computations often depends on algorithm design. Even the fastest machine may prove to be inadequate if insufficient attention is paid to the way in which the computation is organized. We have used several problems from computational physics to illustrate the importance of good algorithms, and we offer some very general principles for designing algorithms. Two subthemes are, first, the strong connection between the algorithm and the architecture of the target machine; and second, the importance of non-numerical methods in scientific computations."
Watch the video: https://wp.me/p3RLHQ-lq3
Learn more: https://extremecomputingtraining.anl.gov/archive/atpesc-2019/agenda-2019/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Nucleon valence quark distribution functions from Lattice QCDChristos Kallidonis
We present results on the nucleon valence quark distribution extracted from Lattice QCD simulations, using a gauge ensemble of $N_f=2+1$ Wilson-Clover fermions with a pion mass of $m_\pi = 350$ MeV and lattice spacing of about $a=0.093$ fm. We obtain reduced Ioffe Time Distributions (rITDs) by computing appropriate matrix elements on the lattice, and elaborate on the extraction of the desired quark distributions from the rITDs following the pseudo-PDF approach. A set of techniques are considered in order to ensure ground state dominance. Theoretical and experimental implications of our calculation are discussed.
The Nucleon Parton Distribution Functions from Lattice QCDChristos Kallidonis
We present results on the nucleon valence quark distribution extracted from Lattice QCD simulations, using a gauge ensemble of $N_f=2+1$ Wilson-Clover fermions with a pion mass of $m_\pi = 350$ MeV and lattice spacing of about $a=0.093$ fm. We obtain reduced Ioffe Time Distributions (rITDs) by computing appropriate matrix elements on the lattice, and elaborate on the extraction of the desired quark distributions from the rITDs following the pseudo-PDF approach. A set of techniques are considered in order to ensure ground state dominance. Theoretical and experimental implications of our calculation are discussed.
In this short talk I present results on key quantities related to the structure of the nucleon, obtained from state-of-the-art Lattice QCD simulations. Results include the nucleon quark contents and the decomposition of the nucleon spin.
Presented at the Early Career Research Symposium 2017 (ECRS 2017), Brookhaven National Laboratory
Hyperon and charmed baryon masses and axial charges from Lattice QCDChristos Kallidonis
Poster presented at the Electromagnetic Interactions on Nucleons and Nuclei 2013 (EINN2013) Conference, held in Paphos, Cyprus. We present results on the masses and axial charges of all forty light, strange and charm baryons, obtained from Lattice QCD simulations
Talk presented at the Electromagnetic Interactions of Nucleons and Nuclei 2015 (EINN 2015) conference, Paphos, Cyprus. In this talk we present results on the axial charges of all forty light, strange and charm baryons from Lattice QCD calculations.
Computing the masses of hyperons and charmed baryons from Lattice QCDChristos Kallidonis
Poster presented at the Computational Sciences 2013 Conference (Winner of poster competition). We present results on the masses of all forty light, strange and charm baryons from Lattice QCD simulations, focusing particularly on the computational aspects and requirements of such calculations.
Hyperon and charm baryons masses from twisted mass Lattice QCDChristos Kallidonis
Talk given at the University of Bonn, Germany. We present results on the masses of all forty light, strange and charm baryons from Lattice QCD simulations. We elaborate on the various methods and techniques followed and examine systematic uncertainties related to isospin breaking effects and finite lattice spacing.
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Toxic effects of heavy metals : Lead and Arsenicsanjana502982
Heavy metals are naturally occuring metallic chemical elements that have relatively high density, and are toxic at even low concentrations. All toxic metals are termed as heavy metals irrespective of their atomic mass and density, eg. arsenic, lead, mercury, cadmium, thallium, chromium, etc.
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Studia Poinsotiana
I Introduction
II Subalternation and Theology
III Theology and Dogmatic Declarations
IV The Mixed Principles of Theology
V Virtual Revelation: The Unity of Theology
VI Theology as a Natural Science
VII Theology’s Certitude
VIII Conclusion
Notes
Bibliography
All the contents are fully attributable to the author, Doctor Victor Salas. Should you wish to get this text republished, get in touch with the author or the editorial committee of the Studia Poinsotiana. Insofar as possible, we will be happy to broker your contact.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Nucleic Acid-its structural and functional complexity.
Nucleon TMD Contractions in Lattice QCD using QUDA
1. Nucleon TMD Contractions in Lattice QCD
using QUDA1
Christos Kallidonis, Sergey Syritsyn
X
Y
key=vl3_xxx_bl2_YY
Nucleon EDMs on a Lattice
at the Physical Point
Sergey N. Syritsyn,
Stony Brook University & RIKEN / BNL Research Center
together with LHP and RBC collaborations
LATTICE 2018
East Lansing, MI, July 22-28, 2018
Courtesy of BMW Collaboration
GPU Hackathon
Brookhaven National Laboratory
Sep. 17-21, 2018
Progress Report
Mentors:
Kate Clark, Mathias Wagner
1 https://github.com/lattice/quda
with GPU Lattice team:
C. Jung, M. Lin, D. Howarth, J. Tu, B. Wang, D. Guo
2. Problem at hand
Degrees of freedom:
• (local) volume sites: x = 1,…,512K
• Ns spin: α,β = 1,…,4
• Nc color: a, b = 1,…,3
• Vector index: k = 1,…,12
• Γ-matrix index: i = 1,…,16
• Complex numbers! x2
# cplx multiply-add / site: N2
c N2
s ⇥ (1 + NcNs) + N3
s
15104 Flops
(2NcNs)2
+ N2
c ) ⇤ cplx = 4752 Bytes
N2
s ⇤ cplx = 256 Bytes
Inp. mem/site:
Out. mem/site:
⇥
=
Uba
(x) wk (x)a
↵ Wk (x)b
↵ ⇥ v?
k (x)b =
+
+
⇥ C(i)
(x)
Fk (x)↵
G(x)↵ =
X
k
Fk (x)↵
G(x)↵
(i)
↵
=
C(i)
(x) =
X
k
X
↵, ,a,b
(i)
↵U(x)ba
wk (x)a
↵v?
k (x)b
3. Kernel optimization
Iteration-0:
• assign 1 thr/site
• loop over, a, b, α, β
• sum over k
• perform trace
Iteration-1:
• QUDA: block/grid auto-tuning functionality
⇥
=
Uba
(x) wk (x)a
↵ Wk (x)b
↵ ⇥ v?
k (x)b =
+
+
⇥ C(i)
(x)
Fk (x)↵
G(x)↵ =
X
k
Fk (x)↵
G(x)↵
(i)
↵
Can do better than that!
Performance per GPU (1/2 K80): ~ 6 GFlop/s
Memory Bandwidth: ~ 1.9 GB/s
Kernel exec. cost: 6 GPU*sec
—> Dominant part of workflow
Nvidia Visual
profiler:
Thanks, Mathias!
C(i)
(x) =
X
k
X
↵, ,a,b
(i)
↵U(x)ba
wk (x)a
↵v?
k (x)b
4. Kernel optimization
Iteration-2:
• move required buffers to shared memory
• extend the block dim. to 3d - assign color/spin
indices to individual threads
• #pragma unroll the (remaining) loops
• inline relevant functions involving Γ-matrices
Kernel exec. cost: 5.2 GPU*sec, x1.15 impr.
Profiler still complains about very
high local memory overhead…
⇥
=
Uba
(x) wk (x)a
↵ Wk (x)b
↵ ⇥ v?
k (x)b =
+
+
⇥ C(i)
(x)
Fk (x)↵
G(x)↵ =
X
k
Fk (x)↵
G(x)↵
(i)
↵
C(i)
(x) =
X
k
X
↵, ,a,b
(i)
↵U(x)ba
wk (x)a
↵v?
k (x)b
5. Kernel optimization
Iteration-3:
• Move Γ-matrices to constant memory, did the trick. Thanks, Kate!
—> compiler could not resolve array indexing,
buffers spilled to local memory
QUDA auto-tuner report:
Performance: 205 Gflop/s
Memory BW: 65 GB/s
Kernel exec. cost: 0.16 GPU*sec (to compare with 5.2 GPU*sec)
—> Now only 4% of workflow
x32 improvement!!
On-going work:
• can we squeeze more Flop/s ?
• optimize communication-intensive code segments
• experiment with env. variables
• update/optimize the rest of contraction kernels
⇥
=
Uba
(x) wk (x)a
↵ Wk (x)b
↵ ⇥ v?
k (x)b =
+
+
⇥ C(i)
(x)
Fk (x)↵
G(x)↵ =
X
k
Fk (x)↵
G(x)↵
(i)
↵
C(i)
(x) =
X
k
X
↵, ,a,b
(i)
↵U(x)ba
wk (x)a
↵v?
k (x)b