The document discusses the National Research Platform (NRP), specifically the 4th NRP workshop. It provides an overview of NRP's Nautilus, a multi-institution hypercluster connected by optical networks across 25 partner campuses. In 2022, Nautilus comprised ~200 computing nodes and 4000TB of rotating storage. The document highlights several large research projects from different domains that utilize Nautilus, including particle physics, telescopes, biomedical applications, earth sciences, and visualization. These applications demonstrate how Nautilus enables data-intensive and collaborative multi-campus research at national scale.
Positioning University of California Information Technology for the Future: S...Larry Smarr
05.02.15
Invited Talk
The Vice Chancellor of Research and Chief Information Officer Summit
“Information Technology Enabling Research at the University of California”
Title: Positioning University of California Information Technology for the Future: State, National, and International IT Infrastructure Trends and Directions
Oakland, CA
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...Spark Summit
The talk will present a MPI-based extension of the Spark platform developed in the context of light source facilities. The background and rationale of this extension are described in the attached paper “Bringing the HPC reconstruction algorithms to Big Data platforms”[1], which has been presented at New York Scientific Data Summit (NYSDS), August 14-17, 2016 (talk: https://www.bnl.gov/nysds16/files/pdf/talks/NYSDS16%20Malitsky.pdf) Specifically, the paper highlighted a gap between two modern driving forces of the scientific discovery process: HPC and Big Data technologies. As a result, it proposed to extend the Spark platform with inter-worker communication for supporting scientific-oriented parallel applications. The approach was illustrated in the context of the Spark-based deployment of the SHARP MPI/GPU ptychographic solver. Aside from its practical value, this application represents a reference use case that captures the major technical aspects of other reconstruction tasks. In the NYSDS’16 paper, the implemented approach followed the CaffeOnSpark RDMA peer-to-peer model and augmented it with the RDMA address exchange server. By the Spark Summit, we plan to further advance this direction with the Spark-MPI generic solution based on the Hydra process management framework for supporting two major MPI implementations, MPICH and MVAPICH.
Positioning University of California Information Technology for the Future: S...Larry Smarr
05.02.15
Invited Talk
The Vice Chancellor of Research and Chief Information Officer Summit
“Information Technology Enabling Research at the University of California”
Title: Positioning University of California Information Technology for the Future: State, National, and International IT Infrastructure Trends and Directions
Oakland, CA
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...Spark Summit
The talk will present a MPI-based extension of the Spark platform developed in the context of light source facilities. The background and rationale of this extension are described in the attached paper “Bringing the HPC reconstruction algorithms to Big Data platforms”[1], which has been presented at New York Scientific Data Summit (NYSDS), August 14-17, 2016 (talk: https://www.bnl.gov/nysds16/files/pdf/talks/NYSDS16%20Malitsky.pdf) Specifically, the paper highlighted a gap between two modern driving forces of the scientific discovery process: HPC and Big Data technologies. As a result, it proposed to extend the Spark platform with inter-worker communication for supporting scientific-oriented parallel applications. The approach was illustrated in the context of the Spark-based deployment of the SHARP MPI/GPU ptychographic solver. Aside from its practical value, this application represents a reference use case that captures the major technical aspects of other reconstruction tasks. In the NYSDS’16 paper, the implemented approach followed the CaffeOnSpark RDMA peer-to-peer model and augmented it with the RDMA address exchange server. By the Spark Summit, we plan to further advance this direction with the Spark-MPI generic solution based on the Hydra process management framework for supporting two major MPI implementations, MPICH and MVAPICH.
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Databricks
In this session, you will learn how CERN easily applied end-to-end deep learning and analytics pipelines on Apache Spark at scale for High Energy Physics using BigDL and Analytics Zoo open source software running on Intel Xeon-based distributed clusters.
Technical details and development learnings will be shared using an example of topology classification to improve real-time event selection at the Large Hadron Collider experiments. The classifier has demonstrated very good performance figures for efficiency, while also reducing the false positive rate compared to the existing methods. It could be used as a filter to improve the online event selection infrastructure of the LHC experiments, where one could benefit from a more flexible and inclusive selection strategy while reducing the amount of downstream resources wasted in processing false positives.
This is part of CERN’s research on applying Deep Learning and Analytics using open source and industry standard technologies as an alternative to the existing customized rule based methods. We show how we could quickly build and implement distributed deep learning solutions and data pipelines at scale on Apache Spark using Analytics Zoo and BigDL, which are open source frameworks unifying Analytics and AI on Spark with easy to use APIs and development interfaces seamlessly integrated with Big Data Platforms.
How HPC and large-scale data analytics are transforming experimental scienceinside-BigData.com
In this deck from DataTech19, Debbie Bard from NERSC presents: Supercomputing and the scientist: How HPC and large-scale data analytics are transforming experimental science.
"Debbie Bard leads the Data Science Engagement Group NERSC. NERSC is the mission supercomputing center for the USA Department of Energy, and supports over 7000 scientists and 700 projects with supercomputing needs. A native of the UK, her career spans research in particle physics, cosmology and computing on both sides of the Atlantic. She obtained her PhD at Edinburgh University, and has worked at Imperial College London as well as the Stanford Linear Accelerator Center (SLAC) in the USA, before joining the Data Department at NERSC, where she focuses on data-intensive computing and research, including supercomputing for experimental science and machine learning at scale."
Watch the video: https://wp.me/p3RLHQ-kLV
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Opening Keynote Lecture
15th Annual ON*VECTOR International Photonics Workshop
Calit2’s Qualcomm Institute
University of California, San Diego
February 29, 2016
This talk will examine issues of workflow execution, in particular using the Pegasus Workflow Management System, on distributed resources and how these resources can be provisioned ahead of the workflow execution. Pegasus was designed, implemented and supported to provide abstractions that enable scientists to focus on structuring their computations without worrying about the details of the target cyberinfrastructure. To support these workflow abstractions Pegasus provides automation capabilities that seamlessly map workflows onto target resources, sparing scientists the overhead of managing the data flow, job scheduling, fault recovery and adaptation of their applications. In some cases, it is beneficial to provision the resources ahead of the workflow execution, enabling the re-use of resources across workflow tasks. The talk will examine the benefits of resource provisioning for workflow execution.
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...Larry Smarr
11.05.13
Invited Presentation
Sanford Consortium for Regenerative Medicine
Salk Institute, La Jolla
Larry Smarr, Calit2 & Phil Papadopoulos, SDSC/Calit2
Title: High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting Stem Cell Research
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...Larry Smarr
11.12.12
Seminar Presentation
Princeton Institute for Computational Science and Engineering (PICSciE)
Princeton University
Title: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research
Princeton, NJ
Stay up-to-date on the latest news, events and resources for the OpenACC community. This month’s highlights covers the first remote GPU Hackathons, a complete schedule of upcoming events, using OpenACC for a biophysics problem, NVIDIA HPC SDK, GCC 10, new resources and more!
Analyzing Large Earth Data Sets: New Tools from the OptiPuter and LOOKING Pro...Larry Smarr
05.05.03
Presentation to 3rd Annual GEON Meeting
Bahia Resort
Title: Analyzing Large Earth Data Sets: New Tools from the OptiPuter and LOOKING Projects
San Diego, CA
High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...Larry Smarr
11.03.28
Remote Luncheon Presentation from Calit2@UCSD
National Science Board
Expert Panel Discussion on Data Policies
National Science Foundation
Title: High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Science and Engineering
Arlington, Virginia
Toward a Global Interactive Earth Observing CyberinfrastructureLarry Smarr
05.01.12
Invited Talk to the 21st International Conference on Interactive Information Processing Systems (IIPS) for Meteorology, Oceanography, and Hydrology Held at the 85th AMS Annual Meeting
Title: Toward a Global Interactive Earth Observing Cyberinfrastructure
San Diego, CA
Stay up-to-date on the latest news, events and resources for the OpenACC community. This month’s highlights covers the newly released PGI 19.7, the upcoming 2019 OpenACC Annual Meeting, GPU Bootcamp at RIKEN R-CCS, a complete schedule of GPU hackathons and more!
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...Larry Smarr
05.02.04
Invited Talk to the NASA Jet Propulsion Laboratory
Title: LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks and High Resolution Visualizations
Pasadena, CA
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Databricks
In this session, you will learn how CERN easily applied end-to-end deep learning and analytics pipelines on Apache Spark at scale for High Energy Physics using BigDL and Analytics Zoo open source software running on Intel Xeon-based distributed clusters.
Technical details and development learnings will be shared using an example of topology classification to improve real-time event selection at the Large Hadron Collider experiments. The classifier has demonstrated very good performance figures for efficiency, while also reducing the false positive rate compared to the existing methods. It could be used as a filter to improve the online event selection infrastructure of the LHC experiments, where one could benefit from a more flexible and inclusive selection strategy while reducing the amount of downstream resources wasted in processing false positives.
This is part of CERN’s research on applying Deep Learning and Analytics using open source and industry standard technologies as an alternative to the existing customized rule based methods. We show how we could quickly build and implement distributed deep learning solutions and data pipelines at scale on Apache Spark using Analytics Zoo and BigDL, which are open source frameworks unifying Analytics and AI on Spark with easy to use APIs and development interfaces seamlessly integrated with Big Data Platforms.
How HPC and large-scale data analytics are transforming experimental scienceinside-BigData.com
In this deck from DataTech19, Debbie Bard from NERSC presents: Supercomputing and the scientist: How HPC and large-scale data analytics are transforming experimental science.
"Debbie Bard leads the Data Science Engagement Group NERSC. NERSC is the mission supercomputing center for the USA Department of Energy, and supports over 7000 scientists and 700 projects with supercomputing needs. A native of the UK, her career spans research in particle physics, cosmology and computing on both sides of the Atlantic. She obtained her PhD at Edinburgh University, and has worked at Imperial College London as well as the Stanford Linear Accelerator Center (SLAC) in the USA, before joining the Data Department at NERSC, where she focuses on data-intensive computing and research, including supercomputing for experimental science and machine learning at scale."
Watch the video: https://wp.me/p3RLHQ-kLV
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Opening Keynote Lecture
15th Annual ON*VECTOR International Photonics Workshop
Calit2’s Qualcomm Institute
University of California, San Diego
February 29, 2016
This talk will examine issues of workflow execution, in particular using the Pegasus Workflow Management System, on distributed resources and how these resources can be provisioned ahead of the workflow execution. Pegasus was designed, implemented and supported to provide abstractions that enable scientists to focus on structuring their computations without worrying about the details of the target cyberinfrastructure. To support these workflow abstractions Pegasus provides automation capabilities that seamlessly map workflows onto target resources, sparing scientists the overhead of managing the data flow, job scheduling, fault recovery and adaptation of their applications. In some cases, it is beneficial to provision the resources ahead of the workflow execution, enabling the re-use of resources across workflow tasks. The talk will examine the benefits of resource provisioning for workflow execution.
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...Larry Smarr
11.05.13
Invited Presentation
Sanford Consortium for Regenerative Medicine
Salk Institute, La Jolla
Larry Smarr, Calit2 & Phil Papadopoulos, SDSC/Calit2
Title: High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting Stem Cell Research
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...Larry Smarr
11.12.12
Seminar Presentation
Princeton Institute for Computational Science and Engineering (PICSciE)
Princeton University
Title: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research
Princeton, NJ
Stay up-to-date on the latest news, events and resources for the OpenACC community. This month’s highlights covers the first remote GPU Hackathons, a complete schedule of upcoming events, using OpenACC for a biophysics problem, NVIDIA HPC SDK, GCC 10, new resources and more!
Analyzing Large Earth Data Sets: New Tools from the OptiPuter and LOOKING Pro...Larry Smarr
05.05.03
Presentation to 3rd Annual GEON Meeting
Bahia Resort
Title: Analyzing Large Earth Data Sets: New Tools from the OptiPuter and LOOKING Projects
San Diego, CA
High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...Larry Smarr
11.03.28
Remote Luncheon Presentation from Calit2@UCSD
National Science Board
Expert Panel Discussion on Data Policies
National Science Foundation
Title: High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Science and Engineering
Arlington, Virginia
Toward a Global Interactive Earth Observing CyberinfrastructureLarry Smarr
05.01.12
Invited Talk to the 21st International Conference on Interactive Information Processing Systems (IIPS) for Meteorology, Oceanography, and Hydrology Held at the 85th AMS Annual Meeting
Title: Toward a Global Interactive Earth Observing Cyberinfrastructure
San Diego, CA
Stay up-to-date on the latest news, events and resources for the OpenACC community. This month’s highlights covers the newly released PGI 19.7, the upcoming 2019 OpenACC Annual Meeting, GPU Bootcamp at RIKEN R-CCS, a complete schedule of GPU hackathons and more!
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...Larry Smarr
05.02.04
Invited Talk to the NASA Jet Propulsion Laboratory
Title: LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks and High Resolution Visualizations
Pasadena, CA
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Larry Smarr - NRP Application Drivers
1. “NRP Application Drivers”
Presentation
4th National Research Platform (4NRP) Workshop
February 9, 2023
1
Dr. Larry Smarr
Founding Director Emeritus, California Institute for Telecommunications and Information Technology;
Distinguished Professor Emeritus, Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSD
http://lsmarr.calit2.net
2. Rotating Storage
4000 TB
2023: NRP’s Nautilus is a Multi-Institution National to Global Scale Hypercluster
Connected by Optical Networks
~200 FIONAs on 25 Partner Campuses
Networked Together at 10-100Gbps
Feb 9, 2023
5. 2022 Nautilus Namespace Users:
Largest User is One Million Times Smallest!
osg-opportunistic
ucsd-haosulab
osg-icecube
ucsd-ravigroup
cms-ml
braingeneers
Nautilus Namespaces
Using >10 GPU-hrs/year
Or >10 CPU-hrs/year
wifire-quicfire
I Will Look in Detail at the
Namespaces in Red
digits
6. The New Pacific Research Platform Video
Highlights 3 Different Applications Out of 800 Nautilus Namespace Projects
Pacific Research Platform Video:
https://nationalresearchplatform.org/media/pacific-research-platform-video/
7. 2015 PRP Grant Was Science-Driven:
Connecting Multi-Campus Application Teams and Devices
Earth
Sciences
UC San Diego UCBerkeley UC Merced
What Are
The Largest 2022
PRP Users
in Each Area?
8. The Open Science Grid (OSG)
Has Been Integrated With the PRP
In aggregate ~ 200,000 Intel x86 cores
used by ~400 projects
Source: Frank Würthwein,
OSG Exec Director; PRP co-PI; UCSD/SDSC OSG Federates ~100 Clusters Worldwide
All OSG User
Communities
Use HTCondor for
Resource Orchestration
SDSC
U.Chicago
FNAL
Caltech
Distributed
OSG Petabyte
Storage Caches
9. The Open Science Grid (OSG) Delivers to Over 50 Fields of Science
2.6 Billion Core-Hours Per Year of Distributed High Throughput Computing
NCSA Delivered
~35,000 Core-Hours
Per Year in 1990
https://gracc.opensciencegrid.org/dashboard/db/gracc-home
CMS
ATLAS
PRP’s Nautilus Appears
as Just Another OSG Resource
10. Nautilus Namespace osg-opportunistic Supported a Wide Set of Applications
As the Largest Consumer of CPU Core-Hours in 2022
3,500
Source: Igor Sfiligoi, SDSC
3.7 Million CPU Core-Hours
Peaking at 3500 CPU Cores
osg-opportunistic runs fully in low-priority mode,
using only PRP CPU cycles
that would otherwise be unused.
12. Bringing Machine Learning to Particle Physics
A new particle was
discovered in 2012
The “holy grail” of the LHC program today is measurement of di-higgs
production to infer the hhh coupling that determines the higgs potential
𝛌
Source: Frank Wuerthwein, SDSC
13. ML Inference as a Service on NRP
13
Raghav Kansal (grad. Stud. UCSD) runs ~1,000 CPU jobs calling out to
~10 GPUs on NRP for inference for his ML model for hh search.
80M events inferenced, sending 1.3TB of data from CPUs to GPUs in 3h
The ML model is too large to fit into the DRAM of the CPUs.
Fastest way to get the job done is “ML Inference as a service” on NRP
~4MB/s output from GPUs
~200MB/s input to GPUs
See Talk by
Shih-Chieh Hsu
4NRP Friday
Source: Frank Wuerthwein, SDSC
14. Namespace cms-ml Was the
4th Largest Consumer of Nautilus GPU-Hours in 2022
157,571 GPU-Hours
Peaking at 130 GPU
PI Frank Wuerthwein, UCSD
16. Co-Existence of Interactive and
Non-Interactive Computing on PRP
GPU Simulations Needed to Improve Ice Model.
=> Results in Significant Improvement
in Pointing Resolution for Multi-Messenger Astrophysics
NSF Large-Scale Observatories Are Using PRP and OSG
as a Cohesive, Federated, National-Scale Research Data Infrastructure
IceCube Peaked at
560 GPUs in 2022!
17. Namespace osg-icecube
Was the Largest Consumer of Nautilus GPU-Hours in 2022
0.8 Million GPU-Hours
Peaking at 560 GPUs
osg-icecube also runs fully in low-priority mode,
using only PRP GPU cycles
that would otherwise be unused.
OSG GPU
Consumers
OSG GPU
Providers
In 2022 Icecube was the Largest consumer of OSG GPU-Hours
and PRP was the Largest Supplier of GPU-Hours to OSG
https://gracc.opensciencegrid.org/d/ujFlp3vVz/gpu-payload-jobs
18. Laser Interferometer Gravitational-Wave Observatory (LIGO)
Uses Nautilus/OSG Data Cyberinfrastructure
• LIGO Runs Their Production Rucio Data Management System on Nautilus
– Rucio is the De-Facto Data Management System for Many Large Instruments, LIGO, LHC, …
– LIGO Continues to be One of the Major Users of the OSG Caching Infrastructure (A.K.A.
Stashcache), Which is Deployed Mostly as PRP-Managed Kubernetes Pods.
• LIGO Does Not Use Much PRP Compute Given Their Dedicated Infrastructure
19. PRP Supports Radio Telescope Through Partnering with
CASPER: the Collaboration for Astronomy Signal Processing and Electronics Research
PRP Access Has Allowed CASPER
to Expand in Several Aspects:
• PRP Portal to CASPER Tools/Libraries
Was Developed by PRP’s John Graham
• The PRP Team Added FPGAs to Nautilus
FIONAs with the CASPER Software Stack
• Nautilus JupyterHub Used for FPGA Training
• Optical Fiber Connected Data Storage
Source: Dan Werthimer
SETI Chief Scientist, UC Berkeley
SETI.berkeley.edu, CASPER.berkeley.edu
Xilinx, Intel, Fujitsu, HP, Nvidia,
NSF, NASA, NRAO, NAIC
The CASPER Collaboration of ~1000 Members
and 50 Radio-Astronomy Instruments Worldwide
to Develop Open-Source
Signal Processing and Instrumentation Pipelines,
Primarily using FPGAs and GPUs.
Radio Telescopes include:
• Event Horizon Telescope
• Square Kilometer Array
• Very Large Array
https://casper.berkeley.edu/
20. PRP Portal to CASPER Tools/Libraries
Developed by PRP’s John Graham, UCSD
See John Graham’s CASPER 2021 Workshop Talk and Tutorial:
https://casper.berkeley.edu/index.php/casper-workshop-2021/agenda/
CASPER designs,
compiles, tests
and evaluates
instrumentation
on the PRP,
then deploys
dedicated
FPGA and GPU
clusters at the
observatories
21. Discoveries Made with CASPER-Enabled Instrumentation
Radio Image
of a Black Hole
Fast Radio Bursts
Weighing the Universe
Pulsar Timing
Gravitational Waves
Diamond Planet Protheses Control
Neutron Imaging
Source: Dan Werthimer, UC Berkeley
23. OpenForceField Uses OPEN Software, OPEN Data, OPEN Science
and PRP to Generate Quantum Chemistry Datasets for Druglike Molecules
www.openforcefield.or
OFF Open-Source Models are Used in Drug Discovery,
Including in the COVID-19 Computing on Folding@Home.
24. OFF Runs Quantum Mechanical Computations on Many Molecules
to Determine Their Optimized Force Fields
25. 50% of OFF compute is run on Nautilus.
PRP is Capable of Running Millions of Quantum Chemistry Workloads
www.openforcefield.org
OpenFF-1.0.0 released OpenFF-2.0.0 released
OpenFF begins using Nautilus
We run "workers" that pull down QC jobs
for computation from a central project queue.
These jobs require between minutes and hours,
and results are uploaded to the
central, public QCArchive server.
Workers are deployed from Docker images and
scheduled on PRP's Kubernetes system. Due to
the short job duration, these deployments can still
be effective if interrupted every few hours.
26. OFF Was the Top Nautilus CPU Core Consumer
in 2020 & 2021, 4th Highest in 2022
7.6 Million CPU Core-Hours
(2020-2022)
Peaking at 1300 CPU Cores
OFF Datasets Consist of Hundreds to Millions of Jobs,
Each Requiring Tens to Thousands of CPU-Hours and 8-32 GB of RAM
27. Dataset listing: https://qcarchive.molssi.org/apps/ml_datasets/
Python example notebooks for data access: https://qcarchive.molssi.org/examples/
OpenFF’s dataset lifecycle: https://github.com/openforcefield/qca-dataset-submission/projects/1
The OFF Datasets on QCArchive
are Fully Open!
28. Nautilus Namespace tempredict Utilized PRP to Compute
COVID-19 and Vaccine Responses ~65K Participants
Purawat et al., IEEE Big Data, 2021
Mason et al., Sci Rep, 2021
Mason et al., Vaccines, 2022
Source: Prof. Benjamin Smarr, UCSD
29. Nautilus Namespace braingeneers: One of the Most Advanced PRP projects -
Uses Optical Fiber Connected Shared Storage, CPUs & GPUs
https://cenic.org/blog/prp-boosts-inter-campus-collaboration-on-brain-research
30. UCSC/Hengenlab Data Analysis Pipeline Using PRP
Hengenlab
UWSL
PRP/S3
Results
PRP
Compute
CNN
Source: David Parks, UCSC; braingeneers PI David Haussler
31. Multiple Worker Processes
Circulate Data
in a 50GB Cache
Sampling Strategy
for braingeneers TB+ data
PRP/S3
PRP
Compute
Jobs Local
NVMe
Model Training
Operates
on the Local Cache
Results
are Returned
to S3
Source: David Parks, UCSC; braingeneers PI David Haussler
32. UCSC, UCSF & WUSL Are Collaborating
To Grow Human Cerebral Organoids and Measure Their Neural Activity
Tetrodes
Multi Electrode Array Silicon Probes
Source: David Parks, UCSC; braingeneers PI David Haussler
33. Goal: For Every Human Brain Slice, Grow 1000 Organoids,
And For Every Organoid, Compute 1000 Simulated Organoids
From Neural Activity in Living Mouse Brain
Human
To Neural Activity in Human Brain Organoids
Source: David Parks, UCSC; braingeneers PI David Haussler
34. Nautilus Namespace braingeneers
Was The 3rd Largest Consumer of CPU Core-Hours in 2022
57,000 GPU-Hours
Peaking at 110 GPUs
950,000 CPU Core-Hours
Peaking at 2000 CPU Cores
https://braingeneers.ucsc.edu/team/
35. NeuroKube: An Automated Neuroscience Reconstruction Framework
Uses Nautilus for Large-Scale Processing & Labeling of Neuroimage Volumes
Figures 2, 4, & 5 in “NeuroKube:
An Automated and Autoscaling Neuroimaging Reconstruction Framework
Using Cloud Native Computing and A.I.,”
Matthew Madany, et al. (IEEE Big Data ’20, pp. 320-330)
36. Computer Vision-Based Approach
Provides the Potential to Automatically Generate Labels Using ML
Subset of Neurites from
Cerebellum Neuropil
Extracted & Rendered
in 3D with Structures
of Interest Labeled
Figures 1 & 14 in “NeuroKube:
An Automated and Autoscaling
Neuroimaging Reconstruction
Framework using
Cloud Native Computing
and A.I.,”
Matthew Madany, et al.
(accepted to IEEE Big Data ’20)
Volumetric Electron Microscopy (VEM)
Data with Colorized Labels
38. NSF-Funded WIFIRE Uses PRP/CENIC to Couple Wireless Edge Sensors
With Supercomputers, Enabling Fire Modeling Workflows
Landscape data
WIFIRE Firemap
Fire Perimeter
Source: Ilkay Altintas, SDSC
Real-Time
Meteorological Sensors
Weather Forecasts
Work Flow
PRP
39. WIFIRE’s Firemap Provides Public Website
Combining Satellite Fire Detections with GIS
SoCal Wildfires Sept 6, 2022
40. PRP is Building on NSF-Funded SAGE Technology
to Bring ML/AI to the Edge For Smoke Plume Detection
Source: Charlie Catlett, Pete Beckman, Argonne National Lab
Source: Ilkay Altinas, SDSC, HDSI
Training Data: Archive of
25,000 Labeled Wireless Camera Images
of Wildland Fires
www.mdpi.com/2072-4292/14/4/1007
PRP namespace digits
41. Nautilus Namespace wifire-quicfire was the 25th Largest 2022 Consumer of CPU Core-Hours;
digits was the 14th Largest GPU Consumer
wifire-quicfire
108,000 CPU Core-Hours
Peaking at 360 CPU Cores
digits
40,700 GPU-Hours
Peaking at 18 GPUs
43. 2017: PRP 20Gbps Connection of UCSD SunCAVE and UCM WAVE Over CENIC
2018-2019: Added Their 90 GPUs to PRP for Machine Learning Computations
Leveraging UCM Campus Funds and NSF CNS-1456638 & CNS-1730158 at UCSD
UC Merced WAVE (20 Screens, 20 GPUs) UCSD SunCAVE (70 Screens, 70 GPUs)
See These VR Facilities in Action in the PRP Video
44. PRP Has Been Bringing Machine Learning to Building Virtual Worlds,
Including Robotics and Autonomous Vehicles
• Goal: Train Robots That Can Manipulate Arbitrary Objects
o Open Drawer, Turn Faucet, Stack Cube, Pull Chair,
Pour Water, Pick And Place, Hang Ropes, Make
Dough, …
(video)
46. A Major Project in UCSD’s Hao Su Lab
is Large-Scale Robot Learning
• We Build A Digital Twin of The Real World in Virtual Reality (VR)
For Object Manipulation
• Agents Evolve In VR
o Specialists (Neural Nets) Learn Specific Skills
by Trial and Error
o Generalists (Neural Nets) Distill Knowledge
to Solve Arbitrary Tasks
• On Nautilus:
o Hundreds of specialists
have been trained
o Each specialist is trained
in millions of environment
variants
o ~10,000 GPU hours per
run
47. UCSD’s Ravi Group: How to Create Visually Realistic
3D Objects or Dynamic Scenes in VR or the Metaverse
Source: Prof. Ravi Ramamoorthi, UCSD
ML Computing Transforms a Series of 2D Images
Into a 3D View Synthesis
48. Machine Learning-Based
Neural Radiance Fields for View Synthesis (NeRFs) Are Transformational!
BY JARED LINDZON
NOVEMBER 10, 2022
A neural radiance field (NeRF) is
a fully-connected neural network
that can generate
novel views of complex 3D scenes,
based on a partial set of 2D images.
https://datagen.tech/guides/synthetic-data/neural-radiance-field-nerf/ Source: Prof. Ravi Ramamoorthi, UCSD
https://youtu.be/hvfV-iGwYX8
49. Namespace ucsd-ravigroup
Consumed the 3nd Most Nautilus GPU-Hours in 2022
200,000 GPU-Hours
Peaking at 122 GPUs
• Much of the compute involves training computationally expensive NeRFs.
• Training time to learn a representation of a single scene on a GPU can vary from seconds to a day.
• NeRFs that can see behind occlusions may require a week of training on 8 GPUs simultaneously.
Source: Alexander Trevithick, UCSD Ravi Group
50. 2022-2026 NRP Future: PRP Federates with
NSF-Funded Prototype National Research Platform
NSF Award OAC #2112167 (June 2021) [$5M Over 5 Years]
PI Frank Wuerthwein (UCSD, SDSC)
Co-PIs Tajana Rosing (UCSD), Thomas DeFanti (UCSD),
Mahidhar Tatineni (SDSC), Derek Weitzel (UNL)