Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Synergy Between CHASE-CI and CineGrid


Published on

Opening Talk
CineGrid/CHASE-CI Workshop
Calit2’s Qualcomm Institute
University of California, San Diego
May 15, 2018

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

The Synergy Between CHASE-CI and CineGrid

  1. 1. “The Synergy Between CHASE-CI and CineGrid” Opening Talk CineGrid/CHASE-CI Workshop Calit2’s Qualcomm Institute University of California, San Diego May 15, 2018 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD 1
  2. 2. DOE ESnet’s Science DMZ Creates a Separate Network for Big Data Applications • A Science DMZ Integrates 4 Key Concepts Into a Unified Whole: – A network architecture designed for high-performance applications, with the science network distinct from the general-purpose network – The use of dedicated systems as data transfer nodes (DTNs) – Performance measurement and network testing systems that are regularly used to characterize and troubleshoot the network – Security policies and enforcement mechanisms that are tailored for high performance science environments Science DMZ Coined 2010
  3. 3. Based on Community Input and on ESnet’s Science DMZ Concept, NSF Has Made Over 200 Campus-Level Awards in 44 States Source: Kevin Thompson, NSF
  4. 4. The Pacific Research Platform (PRP) Interconnects Campus DMZs: CENIC and Pacific Wave are the Optical Backplane NSF CC*DNI Grant $5M 10/2015-10/2020 PI: Larry Smarr, UC San Diego Calit2 Co-PIs: • Camille Crittenden, UC Berkeley CITRIS • Tom DeFanti, UC San Diego Calit2/QI • Philip Papadopoulos, UCSD SDSC • Frank Wuerthwein, UCSD Physics & SDSC
  5. 5. • FIONAs PCs [a.k.a ESnet DTNs]: – ~$8,000 Big Data PC with: – 1 CPU – 10/40 Gbps Network Interface Cards – 3 TB SSDs or 100+ TB Disk Drive – Extensible for Higher Performance to: – +Up to 38 Intel CPUs – +Up to 8 GPUs [4M GPU Core Hours/Week] – +NVMe SSDs for 100Gbps Disk-to-Disk – +Up to 160 TB Disks for Data Posting – $700 10Gpbs FIONAs Being Tested • FIONettes are $250 FIONAs – 1Gbps NIC With USB-3 for Flash Storage or SSD Big Data Science Data Transfer Nodes (DTNs)- Flash I/O Network Appliances (FIONAs) Phil Papadopoulos, SDSC & Tom DeFanti, Joe Keefe & John Graham, Calit2 Key Innovation: UCSD Designed Flash I/O Network Appliances (FIONAs) To Provide Disk-to-Disk Data Transfer at Full Speed on 10/40/100G Networks FIONAS—10/40G, $8,000 FIONette—1G, $250
  6. 6. FIONA8 FIONA8 100G Epyc NVMe 40G 160TB 100G NVMe 6.4T SDSU 100G Gold NVMe March 2018 John Graham, UCSD 100G NVMe 6.4T Caltech 40G 160TB UCAR FIONA8 UCI FIONA8 FIONA8 FIONA8 FIONA8 FIONA8 FIONA8 FIONA8 FIONA8 sdx-controller controller-0 Calit2 100G Gold FIONA8 SDSC 40G 160TB UCR 40G 160TB USC 40G 160TB UCLA 40G 160TB Stanford 40G 160TB UCSB 100G NVMe 6.4T 40G 160TB UCSC 40G 160TB Hawaii Running Kubernetes/Rook/Ceph On PRP Allows Us to Deploy a Distributed PB+ of Storage for Posting Science Data Rook/Ceph - Block/Object/FS Swift API compatible with SDSC, AWS, and Rackspace Kubernetes Centos7
  7. 7. Operational Metrics: Containerized Trace Route Tool Allows Realtime Visualization of Status of Network Links All Kubernetes Nodes on PRP Source: Dmitry Mishin(SDSC), John Graham (Calit2)Presets This node graph shows UCR as the source of the flow to the mesh
  8. 8. Operational Metrics: Containerized perfSONAR MaDDash Dashboards For Realtime Measurements of PRP Number of Paths and Packet Loss Source: Dmitry Mishin(SDSC), John Graham (Calit2)
  9. 9. Data Transfer Rates From 40 Gbps DTN in UCSD Physics Building, Across Campus on PRISM DMZ, Then to Chicago’s Fermilab Over CENIC/ESnet Based on This Success, Würthwein Will Upgrade 40G DTN to 100G For Bandwidth Tests & Kubernetes Integration With OSG, Caltech, and UCSC Source: Frank Würthwein, OSG, UCSD/SDSC, PRP
  10. 10. Global Scientific Instruments Will Produce Ultralarge Datasets Continuously Requiring Dedicated Optic Fiber and Supercomputers Large Synoptic Survey Telescope 3.2 Gpixel Camera Tracks ~40B Objects, Creates 1-10M Alerts/Night Within 1 Minute of Observing 1000 Supernovas Discovered/Night 2x100Gb/s “First Light” In 2019 Talk by Shaw Dong, UCSC Yesterday
  11. 11. Pacific City Neptune Canada 45°N 47°30’N 130°W 127°30’W N Seattle GigaPOP Portland PRP to Include NSF’s Ocean Observatory Initiative Fiber Optic Ocean Observatory on Seafloor Off Washington/Oregon To PRP via Pacific Wave Sea Bottom Electro-optical Cable: 8,000 Volts 10 Gbps Optics Slide Courtesy, John Delaney, UWash Axial Volcano 140 Scientific Instruments
  12. 12. Being There - Remote Live High Definition Video of Deep Sea Hydrothermal Vents Mushroom Hydrothermal Vent on Axial Seamount 1 Mile Below Sea Level Picture Created From 40 HD Frames 14 Minutes Live HD Video On-Line Every 3 Hours 15 feet Slide Courtesy, John Delaney, UWash
  13. 13. John Delaney Viewing High Res Mosaic of Mushroom Hydrothermal Vent on Axial Seamount on Calit2’s VROOM Photo by Tom DeFanti, Calit2 July 26, 2017
  14. 14. Video of Live Video From ROV Controlled From Laptop in Calit2’s VROOM
  15. 15. 40G FIONAs 20x40G PRP-connected WAVE@UC San Diego PRP Now Enables Distributed Virtual Reality PRP WAVE @UC Merced Transferring 5 CAVEcam Images from UCSD to UC Merced: 2 Gigabytes now takes 2 Seconds (8 Gb/sec)
  16. 16. Director: F. Martin Ralph Website: Big Data Collaboration with: Source: Scott Sellers, CW3E Collaboration on Atmospheric Water in the West Between UC San Diego and UC Irvine Director, Soroosh Sorooshian, UCSD Website
  17. 17. Calit2’s FIONA SDSC’s COMET Calit2’s FIONA Pacific Research Platform (10-100 Gb/s) GPUsGPUs Complete workflow time: 20 days20 hrs20 Minutes! UC, Irvine UC, San Diego Major Speedup in Scientific Work Flow Using the PRP Source: Scott Sellers, CW3E
  18. 18. • CONNected objECT (CONNECT) Algorithm, developed at UCI-CHRS – Team: Wei Chu, Scott Sellars, Phu Nguyen, Xiaogang Gao, Kuo-lin Hsu, and Soroosh Sorooshian – Most algorithms do not track the events over it’s life cycle t=1 t=2 t=3 t=4 t=5 Data Hypercube: Longitude Time 1. Each voxel must have 1mm/hr 2. Each object must exist for 24 hours 3. 6 voxel connections Set Object Criteria: CONNECT: Object Segmentation Object Storage (PostgreSQL) 1. Object ID Number 2. Latitude (of each voxel in objects) 3. Longitude (of each voxel in objects) 4. Time (hour) Database Indexes: 5mm/hr Rainfall 1. 60N-60S, 0-360 lat and long 2. Hourly time step 3. March 1st, 2000 to January 1st, 2011 Data Convert Global Precipitation Maps to a Database of Precipitation Spacetime Objects Source: (Sellars et al. 2013, 2015)
  19. 19. Using Machine Learning to Determine the Precipitation Object Starting Locations *Sellars et al., 2017 (in prep)
  20. 20. UC San Diego Jaffe Lab (SIO) Scripps Plankton Camera Off the SIO Pier with Fiber Optic Network
  21. 21. Over 1 Billion Images So Far! Requires Machine Learning for Automated Image Analysis and Classification Phytoplankton: Diatoms Zooplankton: Copepods Zooplankton: Larvaceans Source: Jules Jaffe, SIO ”We are using the FIONAs for image processing... this includes doing Particle Tracking Velocimetry that is very computationally intense.”-Jules Jaffe
  22. 22. New NSF CHASE-CI Grant Creates a Community Cyberinfrastructure: Adding a Machine Learning Layer Built on Top of the Pacific Research Platform Caltech UCB UCI UCR UCSD UCSC Stanford MSU UCM SDSU NSF Grant for High Speed “Cloud” of 256 GPUs For 30 ML Faculty & Their Students at 10 Campuses for Training AI Algorithms on Big Data NSF Program Officer: Mimi McClure
  23. 23. CHASE-CI’s ML Researchers Are Exploring Mapping Machine Learning Algorithm Families Onto Novel Architectures Qualcomm Institute 1. Deep & Recurrent Neural Networks (DNN, RNN) 2. Reinforcement Learning (RL) 3. Variational Autoencoder (VAE) and Markov Chain Monte Carlo (MCMC) 4. Support Vector Machine (SVM) 5. Sparse Signal Processing (SSP) and Sparse Baysian Learning (SBL) 6. Latent Variable Analysis (PCA, ICA)
  24. 24. Next 15!
  25. 25. Google Has Designed and Deployed an NvN TensorFlow Accelerator Calit2 is Negotiating Access for CHASE-CI
  26. 26. Partnering with Cloud Vendors Adds Non von Neumann Processors to CHASE-CI • Microsoft is Putting FPGAs into Their Data Centers to Accelerate Critical Applications • Microsoft is Providing Access for Research Purposes to 432 FPGAS in the Texas Advanced Computing Center • TACC is Joining PRP catapult/
  27. 27. Intel is Positioned to Integrate Multicore CPUs With GPUs, FPGA, and ML Accelerators
  28. 28. The Second National Research Platform Workshop Bozeman, MT August 6-7, 2018 Announced in Internet2 Closing Keynote: Larry Smarr “Toward a National Big Data Superhighway” on Wednesday, April 26, 2017 Co-Chairs: Larry Smarr, Calit2 Inder Monga, ESnet Ana Hunsinger, Internet2 Local Host: Jerry Sheehan, MSU