Amy Walton - NSF’s Computational Ecosystem for 21st Century Science & Engineering
NSF’s Computational Ecosystem for
21st Century Science and Engineering
Amy Walton, Deputy Director
Office of Advanced Cyberinfrastructure
National Science Foundation
1
Fourth National Research Platform (4NRP) Workshop
September 9, 2023
Topics
• Looking Back – The Pacific Research Platform
• A Productive Experiment
• Moving Targets
• Acknowledgements
• Looking Forward – A National Research Ecosystem
• Challenges
• Opportunities
• Resources
2
NSF 15-534: Data, Networking, and Innovation
3
An initial – and productive– collaboration between two OAC programs:
• Campus Cyberinfrastructure (CC*)
• Data Infrastructure Building Blocks (DIBBs)
Area 1: Multi-Campus/Multi-Institution Model Implementations
Emphasis on integration of data and network infrastructure activities
• Awards served as models for potential future national scale network-aware data-
focused cyberinfrastructure.
• Expected to be science-driven, demonstrating a strong and credible connection
to the multi-campus, multi-institutional, and/or regional scientific communities
they serve.
• Emphasized the value of sharing data beyond a specific institution to the wider
science, engineering, and education communities.
Pacific Research Platform: Then and Now
4
• Goal: Expand the campus Science DMZ network systems
model into a regional model for data-intensive science.
• The PRP data-sharing architecture allowed region-wide
virtual co-location of data with computing.
• Endpoints of PRP sites -- devices called Flash I/O Network
Appliances (FIONAs) -- were incorporated into
a Kubernetes cluster of FIONAs called Nautilus.
• Data can traverse multiple, heterogeneous networks with
minimal performance degradation.
Now uses 11 major regional/national networks:
• 737 namespaces (projects)
• >2,100 users
• Researchers at 94 US campuses in 39 states
Not Mentioned in the Original Proposal:
5
• Kubernetes
• Containers
• Automation
• Jupyter
• Ceph
These technologies emerged
and were integrated into what
became Nautilus during the
period of the PRP grant
• Machine Learning
• Artificial Intelligence
• Neutrino Observatory
• COVID
• Wildfires
While all applications listed in the
original proposal were addressed,
these applications became some
of the largest PRP CPU/GPU
application consumers
6
Acknowledgements: Many Contributors
CHASE-CI [CISE/CNS]
2100237 and 2120019
Additional GPU nodes,
expand community
Expanse
1928224 (ACSS-I)
NVIDIA GPUs,
cloud integration,
composable systems
Voyager
2005369 (ACSS-II)
AI-focused
hardware,
Intel/Habana tools
Prototype NRP
2112167 (ACSS-II)
Distributed across SDSC,
U Nebraska – Lincoln,
and MGHPCC
PRP cyberinfrastructure has increased compute
capacity through several sources:
• Individual data-intensive research
faculty at multiple campuses used their
grant resources
• This added ~1/4 of the total GPUs on
Nautilus
Today, Nautilus has nearly 20,000 CPU-cores and
nearly 1500 GPUs
T-NRP
1826967 (CC*)
Connect Quilt Regional
Networks using CENIC
and Internet2
CHASE-CI [CISE/CNS]
1713149 – Cloud of
GPUs for faculty to
train AI algorithms
Astronomy Physics
Computational Bio Material Science
Evolutionary Bio
Climatology
7
Looking Forward: Cyberinfrastructure that
Enables Research Across Science Disciplines
Challenges:
• Large instruments producing
• Big data requiring
• Big compute for
• Highly collaborative scientists
in
• Different specializations
across
• Widely Distributed
infrastructure that must be
• Available, ensure
• Workflow Integrity, and be
• Easy to use while adhering to
• Regulatory or policy
requirements
Data Cyberinfrastructure
• Federal guidance on Open Science and Public Access presents new
opportunities for an agile, scalable and equitable national data
cyberinfrastructure to support data sharing.
• Recent OAC CC* awards provided federated campus storage.
• Required: Follow NSF data practices; sustainability plan; integrate into networks
• Future Directions: How to capitalize on existing investments and achieve a
national scale data CI to support equitable access to and use of data using
FAIR principles?
• Our proposed solution: A loose federated approach of existing and new repositories
and infrastructure which adhere to basic agreed principles.
• Repositories and other data projects that join the network gain benefit from shared
resources and services.
CI Professionals
• A significant barrier to use of national resources is access to CI
professionals who can provide expertise and support that are
responsive to local needs.
• The new ACCESS Computational Science Support Network (CSSN)
provides a framework for engaging, training/mentoring, and
coordinating a network of CI professionals
• The new SCIPE Solicitation (NSF 23-574) supports CI professionals
at the campus or regional level.
• Enables engagement of CI professionals into ACCESS Computational
Science Support Network
• Requires: A plan for mentoring, professional development, and
sustainability; and 20% of supported individual’s time be dedicated to
national activities.
Leadership-class
Capacity Systems
Distributed Services
Cloud resources
Innovative Prototypes/Testbeds
NSF-supported Advanced CI Resources
Anvil Purdue University
Bridges 2 Carnegie-Mellon University
Delta U of Illinois, Urbana-Champaign
Expanse U of California, San Diego
Jetstream 2 University of Indiana + Partners
Stampede 2 U of Texas, Austin
Frontera U of Texas, Austin
Neocortex Carnegie-Mellon University
Voyager U of California, San Diego
Ookami Stonybrook University
NRP U of California, San Diego
ACES Texas A&M University
Learn how to access resources at access-ci.org
Cloudbank U of California, San Diego
CloudLab University of Utah
Chameleon University of Chicago
PATh/OSG U of Wisconsin, Madison
ACCESS Several Partners 10
Democratizing Science through Cyberinfrastructure
Broad, fair, and equitable access to
advanced computing is essential to
democratizing science in the 21st
century
• Significant barriers
• Knowledge: Awareness, discovery, expertise,
support
• Technical: Allocation, access, on-ramps
• Social: Awareness of the importance of
access to CI, rewards structures
• Complex tradeoffs / optimizations
• Capacity vs. capability
• Stability vs. innovation
• Performance vs. ease of use
• Expert vs. novice
M. Parashar, "Democratizing Science Through
Advanced Cyberinfrastructure"
in Computer, vol. 55, no. 09, pp. 79-84, 2022.
doi:10.1109/MC.2022.3174928
Advanced Computing Ecosystem as a Strategic
National Asset
12
National Strategic Computing
Reserve (NSCR)
• A coalition of experts
and resource providers
that could be mobilized
quickly to provide
critical computational
resources in times of
urgent need
• Build on experiences
from the COVID-19
HPC Consortium,
responses to RFI
• Aligns with the FACE
Strategic plan
NSF’s Advanced Cyberinfrastructure
Ecosystem: Highly Accessible
Computing
• Network of advanced
systems and services
• Leadership and
capacity systems,
testbeds
• Federation (PATh) and
coordination services
(ACCESS)
• Scalable user support
networks
https://www.whitehouse.gov/wp-
content/uploads/2021/10/National-Strategic-
Computing-Reserve-Blueprint-Oct2021.pdf
Democratized access to an
advanced CI Ecosystem
Realizing an Advanced CI Ecosystem for All
• Integrated and user-friendly portals and
gateways for discovering and accessing
resources;
• Access to local CI resources as part of a shared
fabric of national CI resources reachable through
high-speed frictionless data networking;
• Diverse and flexible allocation and access
modes that support a diversity of users and
applications;
• Agile, easily accessible, and scalable networks of
experts providing embedded expertise and
support that is responsive to local needs; and
• Broadly accessible training targeting the
spectrum of CI users and skills.
The Missing Millions: Democratizing Computation and Data
to Bridge Digital Divides and Increase Access to Science for
Underrepresented Communities (A. Blatecky, EAGER)
https://www.rti.org/publication/missing-
millions/fulltext.pdf