Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PRP, NRP, GRP & the Path Forward

144 views

Published on

Presentation
2nd National Research Platform Workshop
Bozeman, MT

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

PRP, NRP, GRP & the Path Forward

  1. 1. “PRP, NRP, GRP, & the Path Forward” Presentation 2nd National Research Platform Workshop Bozeman, MT August 6, 2018 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD http://lsmarr.calit2.net 1
  2. 2. ESnet’s ScienceDMZ Accelerates Science Research: DOE & NSF Partnering on Science Engagement and Technology Adoption Science DMZ Data Transfer Nodes (DTN/FIONA) Network Architecture (zero friction) Performance Monitoring (perfSONAR) ScienceDMZ Coined in 2010 by ESnet Basis of PRP Architecture and Design http://fasterdata.es.net/science-dmz/ DOE NSF NSF CC* program (2012+) Funded Deployment of ScienceDMZ on 200 Univ. campuses www.nsf.gov/funding/pgm_summ.jsp?pims_id=504748 Slide From Inder Monga, ESnet See Talk by Eli Dart & Deep Dive #2 Tuesday
  3. 3. (GDC) Logical Next Step: The Pacific Research Platform Networks Campus DMZs to Create a Regional End-to-End Science-Driven “Big Data Superhighway” System NSF CC*DNI Grant $5M 10/2015-10/2020 PI: Larry Smarr, UC San Diego Calit2 Co-PIs: • Camille Crittenden, UC Berkeley CITRIS, • Tom DeFanti, UC San Diego Calit2/QI, • Philip Papadopoulos, UCSD SDSC, • Frank Wuerthwein, UCSD Physics and SDSC Letters of Commitment from: • 50 Researchers from 15 Campuses • 32 IT/Network Organization Leaders NSF Program Officer: Amy Walton Source: John Hess, CENIC
  4. 4. PRP National-Scale Experimental Distributed Pilot: Using CENIC & Internet2 to Connect Early-Adopter Quilt Regional R&E Networks Announced May 8, 2018 Internet2 Global Summit See NRP Pilot Monday; Scaling Tuesday Original PRP CENIC/PW Link Extended PRP Testbed NSF CENIC Link
  5. 5. PRP Science DMZ Data Transfer Nodes (DTNs) - Flash I/O Network Appliances (FIONAs) UCSD Designed FIONAs To Solve the Disk-to-Disk Data Transfer Problem at Full Speed on 10G, 40G and 100G Networks FIONAS—10/40G, $8,000 Phil Papadopoulos, SDSC & Tom DeFanti, Joe Keefe & John Graham, Calit2 FIONette—1G, $250 Five Racked FIONAs at Calit2: • Each Contains: • Dual 12-Core CPUs • 96GB RAM • 1TB SSD • 2 10GbE interfaces • Total ~$10,500 • With 8 GPUs • total ~$18,500 Report on 3-Day FIONA Hands-On Workshop For EPSCoR, MSI & EPSCoR Deep Dive #3 Monday; EPSCoR Talk Tuesday
  6. 6. GPN Becomes the First Multi-State Regional Network to Peer with the PRP Between the PRP-Contributed PWave DTN in Los Angeles To GPN FIONA in UMC Before PRP 0.8 Gbps, In May Seeing 3.7Gbs Over PRP, Now 11 Gbps Source: John Hess, CENIC and George Rob III, UMissouri May 30, 2018 See James Deaton NRP Pilot Monday
  7. 7. Game Changer: Using Kubernetes to Manage Containers Across the PRP “Kubernetes is a way of stitching together a collection of machines into, basically, a big computer,” --Craig Mcluckie, Google and now CEO and Founder of Heptio "Everything at Google runs in a container." --Joe Beda,Google “Kubernetes has emerged as the container orchestration engine of choice for many cloud providers including Google, AWS, Rackspace, and Microsoft, and is now being used in HPC and Science DMZs. --John Graham, Calit2/QI UC San Diego Amazingly, I Didn’t Mention Kubernetes Last Year Kubernetes Tutorial Sunday
  8. 8. Rook is Ceph Cloud-Native Object Storage ‘Inside’ Kubernetes https://rook.io/ Source: John Graham, Calit2/QI Kubernetes Tutorial Sunday
  9. 9. 40G 160TB 40G 160TB HPWREN 100G NVMe 6.4TB FIONA8 2.5 FIONA8s 100G Epyc NVMe 100G Gold NVMe July 2018 John Graham, UCSD 100G NVMe 6.4TB Caltech* 40G 160TB UCAR FIONA8 FIONA8 3 FIONA8s Calti2/UCI FIONA8 FIONA8 >50 FIONA2s FIONA8 FIONA8 6 FIONA8s sdx-controller 2x40G 160TB HPWREN Calit2/QI*/SIO 100G Gold FIONA8 SDSC 40G 160TB UCR 40G 160TB USC* 2x40G 160TB UCLA 40G 160TB Stanford U 40G 160TB UCSB 100G NVMe 6.4TB 40G 160TB UCSC* 40G 160TB U Hawaii PRP is Deploying Distributed Petabytes of Storage for Posting/Staging Data at $10/TB per Year by Leveraging our Base of Installed FIONAs 10G FIONA$1K 40G 160TB HPWREN 100G NVMe 6.4TB 2 FIONA4s SDSU* Kubernetes Centos7 Rook/Ceph - Block/Object/FS Swift API compatible with SDSC, AWS, and Rackspace Alex Szalay Deep Dive #4 Monday Rob Gardner Tuesday Dima Mishin Sunday
  10. 10. Operational Metrics: Containerized Trace Route Tool Allows Realtime Visualization of Status of Network Links All Kubernetes Nodes on PRP Source: Dmitry Mishin(SDSC), John Graham (Calit2)Presets This node graph shows UCR as the source of the flow to the mesh
  11. 11. Operational Metrics: Containerized perfSONAR MaDDash Dashboards For Realtime Measurements of PRP Number of Paths and Packet Loss Source: Dmitry Mishin(SDSC), John Graham (Calit2)
  12. 12. Quilt Members Have Built Their Own perfSONAR MaDDash Inspired by PRP http://quiltmesh.onenet.net/maddash-webui/ Source: Jen Leasure, Quilt Aug. 4, 2018
  13. 13. Expanding to the Global Research Platform (GRP) Via CENIC/Pacific Wave, Internet2, and International Links PRP/ CENIC/PW PRP’s Current International Partners Korea Shows Distance is Not the Barrier to Above 5Gb/s Disk-to-Disk Performance Netherlands Guam Australia Korea Japan Singapore International- Scale Measurement Technologies/ Techniques Tuesday
  14. 14. PRP’s First 2.5 Years: Connecting Multi-Campus Application Teams and Devices Earth Sciences See Following Panel: Science Drivers for NRP
  15. 15. PRP Science Application Class #1: Providing High Performance Access to Distributed Data Analysis
  16. 16. Data Transfer Rates From 40 Gbps DTN in UCSD Physics Building, Across Campus on PRISM DMZ, Then to Chicago’s Fermilab Over CENIC/ESnet Based on This Success, Würthwein Will Upgrade 40G DTN to 100G For Bandwidth Tests & Kubernetes Integration With OSG, Caltech, and UCSC Source: Frank Würthwein, OSG, UCSD/SDSC, PRP
  17. 17. PRP Distributed Tier-2 Cache Across Caltech & UCSD-Thousands of Flows Sustaining >10Gbps! Cache Server Cache Server… Redirect or Cache Server Cache Server… Redirect or UCSD Caltech Redirector Top Level Cache Global Data Federation of CMS Provisioned pilot systems: PRP UCSD: 9 x 12 SATA Disk of 2TB @ 10Gbps for Each System PRP Caltech: 2 x 30 SATA Disk of 6TB @ 40Gbps for Each System Source: Frank Würthwein, OSG, UCSD/SDSC, PRP; Havey Newman, Caltech
  18. 18. Collaboration Opportunity with OSG/PRP/I2 on Distributed Storage 1.8PB1.2PB1.6PB 210TB Total data volume pulled last year is dominated by 4 caches. OSG Is Operating a Distributed Caching CI. At Present, 4 Caches Provide Significant Use PRP Kubernetes Infrastructure Could Either Grow Existing Caches by Adding Servers, or by Adding Additional Locations StashCache Users include: LIGO DES Source: Frank Würthwein, OSG, UCSD/SDSC, PRP See Talk on OSG/PRP/I2 Tuesday
  19. 19. PRP Science Application Class #2: Providing High Performance Access to Remote Supercomputers
  20. 20. Distributed Computation on PRP Coupling SDSU Cluster and SDSC Comet Using Kubernetes Containers 25 years Developed and executed MPI-based PRP Kubernetes Cluster execution [CO2,aq] 100 Year Simulation 4 days 75 years 100 years • 0.5 km x 0.5 km x 17.5 m • Three sandstone layers separated by two shale layers Simulating the Injection of CO2 in Brine-Saturated Reservoirs: Poroelastic & Pressure-Velocity Fields Solved In Parallel With MPI Using Domain Decomposition Across Containers Source: Chris Paolini and Jose Castillo, SDSU See Talk by Chris Paolini Sunday
  21. 21. Speeding Downloads Using 100 Gbps PRP Link Over CENIC Couples UC Santa Cruz Astrophysics Cluster to LBNL NERSC Supercomputer CENIC 2018 Innovations in Networking Award for Research Applications NSF-Funded Cyberengineer Shaw Dong @UCSC Receiving FIONA Feb 7, 2017
  22. 22. The Great Plains Network Has Many Campuses With Active Projects at SDSC GPN Map Source: James Deaton, GPN Shawn Strande, SDSC
  23. 23. PRP Science Application Class #3: Providing High Perf. Access to SensorNets Coupled to Realtime Computing
  24. 24. Church Fire, San Diego CA Alert SD&ECameras/HPWREN October 21, 2017 New PRP Application: Coupling Wireless Wildfire Sensors to Computing Thomas Fire, Ventura, CA Firemap Tool, WIFIRE December 10, 2017 CENIC 2018 Innovations in Networking Award for Experimental Applications See HPWREN Deep Dive #1 Tuesday
  25. 25. Once a Wildfire is Spotted, PRP Brings High-Resolution Weather Data to Fire Modeling Workflows in WIFIRE Real-Time Meteorological Sensors Weather Forecast Landscape data WIFIRE Firemap Fire Perimeter Work Flow PRP Source: Ilkay Altintas, SDSC
  26. 26. Fiber Optic Network Streams Images From UC San Diego Jaffe Lab (SIO) Scripps Plankton Microscope Camera
  27. 27. Over 1 Billion Images So Far! Requires Machine Learning for Automated Image Analysis and Classification Phytoplankton: Diatoms Zooplankton: Copepods Zooplankton: Larvaceans Source: Jules Jaffe, SIO ”We are using the FIONAs for image processing... this includes doing Particle Tracking Velocimetry that is very computationally intense.”-Jules Jaffe
  28. 28. Adding Machine Learning to PRP: Left & Right Brain Computing: Arithmetic vs. Pattern Recognition Adapted from D-Wave
  29. 29. New NSF CHASE-CI Grant Creates a Community Cyberinfrastructure: Adding a Machine Learning Layer Built on Top of the Pacific Research Platform Caltech UCB UCI UCR UCSD UCSC Stanford MSU UCM SDSU NSF Grant for High Speed “Cloud” of 256 GPUs For 30 ML Faculty & Their Students at 10 Campuses for Training AI Algorithms on Big Data See Venkat Vishwanath, Deep Dive #4 Tuesday
  30. 30. FIONA8: Adding GPUs to FIONAs Supports Data Science Machine Learning Multi-Tenant Containerized GPU JupyterHub Running Kubernetes / CoreOS Eight Nvidia GTX-1080 Ti GPUs ~$13K 32GB RAM, 3TB SSD, 40G & Dual 10G ports Source: John Graham, Calit2
  31. 31. 48 GPUs for OSG Applications UCSD Adding >350 Game GPUs to Data Sciences Cyberinfrastructure - Devoted to Data Analytics and Machine Learning SunCAVE 70 GPUs WAVE + Vroom 48 GPUs FIONA with 8-Game GPUs 95 GPUs for Students CHASE-CI Grant Provides 96 GPUs at UCSD for Training AI Algorithms on Big Data Plus 288 64-bit GPUs On SDSC’s Comet
  32. 32. Next Step: Using Kubernetes to Surround the PRP Machine Learning Platform With Clouds of CPUs, GPUs and Non-Von Neumann Processors CHASE-CI 64-TrueNorth Cluster 64-bit GPUs 4352x NVIDIA Tesla V100 GPUs See Talks by NSF Clouds, Google, Amazon Microsoft Installs Altera FPGAs into Bing Servers & 384 into TACC for Academic Access
  33. 33. Calit2 Has Established Labs On Both UC San Diego and UC Irvine Campuses For Exploring Machine Learning on von Neumann and NvN Processors Charless Fowlkes, Director Ken Kreutz Delgado, Director
  34. 34. Our Support: • US National Science Foundation (NSF) awards  CNS 0821155, CNS-1338192, CNS-1456638, CNS-1730158, ACI-1540112, & ACI-1541349 • University of California Office of the President CIO • UCSD Chancellor’s Integrated Digital Infrastructure Program • UCSD Next Generation Networking initiative • Calit2 and Calit2 Qualcomm Institute • CENIC, PacificWave and StarLight • DOE ESnet

×