“The Pacific Research Platform:
A Regional-Scale Big Data Analytics
Cyberinfrastructure”
National Ocean Exploration Forum 2017
Ocean Exploration in a Sea of Data
Calit2’s Qualcomm Institute
University of California, San Diego
October 21, 2017
Dr. Larry Smarr
Director, California Institute for Telecommunications and Information Technology
Harry E. Gruber Professor,
Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSD
http://lsmarr.calit2.net
1
Vision:
Use Optical Fiber to Connect
Big Data Generators and Consumers,
Creating a “Big Data” Freeway System
“The Bisection Bandwidth of a Cluster Interconnect,
but Deployed on a 20-Campus Scale.”
This Vision Has Been Building for 15 Years
The NSF OptIPuter Project: Using Supernetworks
to Meet the Needs of Data-Intensive Researchers
OptIPortal–
Termination
Device
for the
OptIPuter
Global
Backplane
Calit2 (UCSD, UCI), SDSC, and UIC Leads—Larry Smarr PI
Univ. Partners: NCSA, USC, SDSU, NW, TA&M, UvA, SARA, KISTI, AIST
Industry: IBM, Sun, Telcordia, Chiaro, Calient, Glimmerglass, Lucent
2003-2009
$13,500,000
In August 2003,
Jason Leigh and his
students used
RBUDP to blast
data from NCSA to
SDSC over the
TeraGrid DTFnet,
achieving18Gbps
file transfer out of
the available
20Gbps
LS Slide 2005
We Have Been Working Towards Distributed Big Data for 15 Years:
NSF OptIPuter, Quartzite, Prism Awards
PI Papadopoulos,
2013-2015
PI Smarr,
2002-2009
PI Papadopoulos,
2004-2007
Based on Community Input and on ESnet’s Science DMZ Concept,
NSF Has Funded Over 100 Campuses to Build Local Big Data Freeways
Red 2012 CC-NIE Awardees
Yellow 2013 CC-NIE Awardees
Green 2014 CC*IIE Awardees
Blue 2015 CC*DNI Awardees
Purple Multiple Time Awardees
Source: NSF
Terminating the Fiber Optics - Data Transfer Nodes (DTNs):
Flash I/O Network Appliances (FIONAs)
UCSD Designed FIONAs
To Solve the Disk-to-Disk
Data Transfer Problem
For Big Data
at Full Speed
on 10G, 40G and 100G Networks
FIONAS—10/40G, $8,000
FIONette—1G, $1,000
Phil Papadopoulos, SDSC &
Tom DeFanti, Joe Keefe & John Graham, Calit2
John Graham, Calit2
How UCSD DMZ Network Transforms Big Data Microbiome Science:
Preparing for Knight/Smarr 1 Million Core-Hour Analysis
Knight Lab
FIONA
10Gbps
Gordon
Prism@UCSD
Data Oasis
7.5PB,
200GB/s
Knight 1024 Cluster
In SDSC Co-Lo
CHERuB
100Gbps
Emperor & Other Vis Tools
64Mpixel Data Analysis Wall
120Gbps
40Gbps
1.3Tbps
(GDC)
Logical Next Step: The Pacific Research Platform Creates
a Regional End-to-End Science-Driven “Big Data Superhighway” System
NSF CC*DNI Grant
$5M 10/2015-10/2020
PI: Larry Smarr, UC San Diego; Calit2
Co-PIs:
• Camille Crittenden, UC Berkeley CITRIS,
• Tom DeFanti, UC San Diego Calit2,
• Philip Papadopoulos, UCSD SDSC,
• Frank Wuerthwein, UCSD Physics and SDSC
Letters of Commitment from:
• 50 Researchers from 15 Campuses
• 32 IT/Network Organization Leaders
We Measure Disk-to-Disk Throughput with 10GB File Transfer
4 Times Per Day in Both Directions for All PRP Sites
January 29, 2016
From Start of Monitoring 12 DTNs
to 24 DTNs Connected at 10-40G
in 1 ½ Years
July 21, 2017
Source: John Graham, Calit2
PRP’s First 1.5 Years:
Connecting Multi-Campus Application Teams and Devices
Our Prototype System – Built for for Scientists
Out of a Bunch of Independently Managed Networks
• Challenge:
– Campus DMZs, Regional (e.g., CENIC), National (Internet2),
International Networks (e.g., GLIF) are Individually-Architected Systems
• How Do They Work Together with Predictable Performance?
•  PRP is Focused on Disk-to-Disk Data Movement
– From the Eyes of Domain Scientists
– End-to-End for Their Data is Their Only Real Metric of Concern
(As it Should Be)
Source: Phil Papadopoulos
Data Transfer Rates From UCSD Physics Building Servers
Across Campus and Then To Chicago’s Fermilab
Utilizing UCSD Prism Campus
Optical Network
Source: Frank Wuerthwein, UCSD, SDSC
Global Scientific Instruments Will Produce Ultralarge Datasets Continuously
Requiring Dedicated Optic Fiber and Supercomputers
Square Kilometer Array
https://tnc15.terena.org/getfile/1939
Large Synoptic Survey Telescope
3.2 Gpixel Camera
Tracks ~40B Objects,
Creates 10M Alerts/Night
Within 1 Minute of Observing
2x100Gb/s
“First Light”
In 2019
100 Gbps FIONA at UCSC Allows for Downloads to the UCSC Hyades Cluster
from the LBNL NERSC Supercomputer for DESI Science Analysis
300 images per night.
100MB per raw image
30GB per night
120GB per night
250 images per night.
530MB per raw image
150 GB per night
800GB per night
Source: Peter Nugent, LBNL
Professor of Astronomy, UC Berkeley
Precursors to
LSST and NCSA
NSF-Funded Cyberengineer
Shaw Dong @UCSC
Receiving FIONA
Feb 7, 2017
40G FIONAs
20x40G PRP-connected
WAVE@UC San Diego
PRP Now Enables
Distributed Virtual Reality
PRP
WAVE @UC Merced
Transferring 5 CAVEcam Images from UCSD to UC Merced:
2 Gigabytes now takes 2 Seconds (8 Gb/sec)
Five Examples of Earth Sciences Research Teams
That Are in the Early Stages of Using PRP
Frank Vernon - Expansion of HPWREN
Dan Cayan, Mike Dettinger
Regional Downscaling of Climate Models
Scott Sellars, Marty Ralph
Center for Western Weather and Water Extremes
John Delaney-
Undersea Cabled Observatory
Jules Jaffe,
Ocean Microscope
Dan Cayan
USGS Water Resources Discipline
Scripps Institution of Oceanography, UC San Diego
much support from Mary Tyree, Mike Dettinger, Guido Franco and other colleagues
Sponsors:
California Energy Commission
NOAA RISA program
California DWR, DOE, NSF
Planning for climate change in California
substantial shifts on top of already high climate variability
SIO Campus Climate Researchers Need to Download
Results from NCAR Remote Supercomputer Simulations
to Make Regional Climate Change Forecasts
NCAR Upgrading to 10Gbps Link from Wyoming and Boulder to CENIC/PRP
Improving the Disk-to-Disk Bandwidth
From ORNL and NCAR/Wyoming to the PRP and on to SIO
Calit2 to FIONA in SIO Co-Lo is 40Gbps
NCAR Installed a 100G WAN link Between
the NCAR Wyoming Supercomputer Center (NWSC)
and the Front Range GigaPoP (FRGP),
Which is Linked to CENIC/PRP and UC San Diego
PRP Using Globus Transfer
of a Year of Monthly Output
from the 0.1-Degree Ocean General Circulation Model Run
at ORNL/OLCF to the UCSD FIONA,
Resulted in a 8x Speedup in Transfer Time
The High Performance Wireless Research and Education Network (HPWREN)
Supports Earth and Environmental Sciences in SoCal
https://hpwren.ucsd.edu/
Source: Frank Vernon,
Hans Werner Braun HPWREN
HPWREN Provides Wireless Backhaul for Seismic Stations
on SoCal Fault Lines
Source: Frank Vernon,
Hans Werner Braun HPWREN
HPWREN Real-Time Network Cameras on Mountains
for Environmental Observations and Fires
San Diego County Red Mountain Fire Cameras
• Southeast (left) “Highway” Fire
• Southwest (center rear) “Poinsettia” Fire
• West (right) “Tomahawk” Fire
Source: Frank Vernon,
Hans Werner Braun HPWREN
May 14, 2014
PRP Backbone Sets Stage for 2017 Expansion
of HPWREN, Connected to CENIC, into Orange and Riverside Counties
• PRP CENIC 100G Link
UCSD to SDSU
– DTN FIONAs Endpoints
– Data Redundancy
– Disaster Recovery
– High Availability
– Network Redundancy
• CENIC Enables PRP
10G Links Between
UCSD, SDSU, & UCI
HPWREN Servers
• Potential Future UCR
CENIC Anchor
UCR
UCI
UCSD
SDSU
Source: Frank Vernon,
Hans Werner Braun HPWREN UCI Antenna Dedicated
June 27, 2017
Pacific
City
Neptune
Canada
45°N
47°30’N
130°W 127°30’W
N
Seattle
GigaPOP
Portland
Possible PRP 2017 Expansion to Include NSF’s Ocean Observatory Initiative
Fiber Optic SensorNets on Seafloor Off Washington
To PRP via
Pacific Wave
Sea Bottom
Electro-optical Cable:
8,000 Volts
10 Gbps Optics
Slide Courtesy, John Delaney, UWash
Axial Volcano
140 Scientific Instruments
John Delaney Visiting UCSD’s SIO
For Three Months in 2017
Being There - Remote Live High Definition Video
of Deep Sea Hydrothermal Vents
http://novae.ocean.washington.edu/story/Ashes_CAMHD_Liv
Mushroom Hydrothermal Vent
on Axial Seamount
1 Mile Below Sea Level
Picture Created
From 40 HD Frames
14 Minutes Live HD Video
On-Line Every 3 Hours
15 feet
Slide Courtesy, John Delaney, UWash
Director: F. Martin Ralph Website: cw3e.ucsd.edu
Big Data Collaboration with:
Source: Scott Sellers, CW3E
Collaboration on Atmospheric Water in the West
Between UC San Diego and UC Irvine
Director, Soroosh Sorooshian, UCSD Website http://chrs.web.uci.edu
Calit2’s FIONA
SDSC’s COMET
Calit2’s FIONA
Pacific Research Platform (10-100 Gb/s)
GPUsGPUs
Complete workflow time: 20 days20 hrs20 Minutes!
UC, Irvine UC, San Diego
Major Speedup in Scientific Work Flow
Using the PRP
Source: Scott Sellers, CW3E
• CONNected objECT (CONNECT) Algorithm, developed at UCI-CHRS
– Team: Wei Chu, Scott Sellars, Phu Nguyen, Xiaogang Gao, Kuo-lin Hsu, and Soroosh Sorooshian
– Most algorithms do not track the events over it’s life cycle
t=1
t=2
t=3
t=4
t=5
Data Hypercube:
Longitude
Time
1. Each voxel must have 1mm/hr
2. Each object must exist for 24
hours
3. 6 voxel connections
Set Object Criteria:
CONNECT: Object Segmentation Object Storage
(PostgreSQL)
1. Object ID Number
2. Latitude (of each voxel in
objects)
3. Longitude (of each voxel in
objects)
4. Time (hour)
Database Indexes:
5mm/hr
Rainfall
1. 60N-60S, 0-360 lat and long
2. Hourly time step
3. March 1st, 2000 to January 1st,
2011
Data
Convert Global Precipitation Maps to
a Database of Precipitation Spacetime Objects
Source: (Sellars et al. 2013, 2015)
Using Machine Learning to Determine
the Precipitation Object Starting Locations
*Sellars et al., 2017 (in prep)
Jaffe Lab (SIO) Scripps Plankton Camera
Off the SIO Pier with Fiber Optic Network
Over 300 Million Images So Far!
Capturing Vast Microscopic Biodiversity in the Oceans
Phytoplankton: Diatoms
Zooplankton: Copepods
Zooplankton: Larvaceans
Source: Jules Jaffe, SIO
”We are using the FIONAs for image processing...
this includes doing Particle Tracking Velocimetry
that is very computationally intense.”-Jules Jaffe
Requires Machine Learning
for Automated Image Analysis and Classification
CNN vs Random Forest (SVM)
New NSF CHASE-CI Grant Creates a Community Cyberinfrastructure
Adding a Machine Learning Layer Built on Top of the Pacific Research Platform
Caltech
UCB
UCI UCR
UCSD
UCSC
Stanford
MSU
UCM
SDSU
NSF Grant for High Speed “Cloud” of 256 GPUs
For 30 ML Faculty & Their Students at 10 Campuses
for Training AI Algorithms on Big Data
Machine Learning Researchers
Need a New Cyberinfrastructure
“Until cloud providers are willing to find a solution
to place commodity (32-bit) game GPUs into their servers
and price services accordingly,
I think we will not be able to leverage the cloud effectively.”
“There is an actual scientific infrastructure need here,
surprisingly unmet by the commercial market,
and perhaps CHASE-CI is the perfect catalyst to break this logjam.”
--UC Berkeley Professor Trevor Darrell
FIONAs and FIONettes – Flash I/O Network Appliances
Solve the PRP Disk-to-Disk Data Transfer Problem
Multi-Tenant Containerized GPU JupyterHub
Running Kubernetes / CoreOS
Eight Nvidia GTX-1080 Ti GPUs
~$13K
32GB RAM, 3TB SSD, 40G & Dual 10G ports
Source: John Graham, Calit2
Now Adding GPUs to Support Data Science Machine Learning
Single vs. Double Precision GPUs:
Gaming vs. Supercomputing
8 x 1080 Ti: 1 Million GPU
Core-Hours Every 2 Days,
Cost of a Starbucks Latte.
500 Million GPU Core-Hours
for $14K in 3yrs
48 GPUs for
OSG Applications
UCSD Adding >350 Game GPUs to Data Sciences Cyberinfrastructure -
Devoted to Data Analytics and Machine Learning
SunCAVE 70 GPUs
WAVE + Vroom 48 GPUs
FIONA with
8-Game GPUs
88 GPUs
for Students
CHASE-CI Grant Provides
96 GPUs at UCSD
for Training AI Algorithms on Big Data
The Future of Supercomputing Will Blend Traditional HPC and Data Analytics
Integrating Non-von Neumann Architectures
“High Performance Computing Will Evolve
Towards a Hybrid Model,
Integrating Emerging Non-von Neumann Architectures,
with Huge Potential in Pattern Recognition,
Streaming Data Analysis,
and Unpredictable New Applications.”
Horst Simon, Deputy Director,
U.S. Department of Energy’s
Lawrence Berkeley National Laboratory
Calit2’s Qualcomm Institute Has Established a Pattern Recognition Lab
For Machine Learning on GPUs and von Neumann and NvN Processors
Source: Dr. Dharmendra Modha
Founding Director, IBM Cognitive Computing Group
August 8, 2014
UCSD ECE Professor Ken Kreutz-Delgado Brings
the IBM TrueNorth Chip
to Start Calit2’s Qualcomm Institute
Pattern Recognition Laboratory
September 16, 2015
Next Step: Surrounding the UCSD Data Sciences Machine Learning Platform
With Clouds of GPUs and Non-Von Neumann Processors
Microsoft Installs Altera FPGAs
into Bing Servers &
384 into TACC for Academic Access
64-TrueNorth
Cluster
CHASE-CI64-bit GPUs
Our Support:
• US National Science Foundation (NSF) awards
 CNS 0821155, CNS-1338192, CNS-1456638, CNS-1730158,
ACI-1540112, & ACI-1541349
• University of California Office of the President CIO
• UCSD Chancellor’s Integrated Digital Infrastructure Program
• UCSD Next Generation Networking initiative
• Calit2 and Calit2 Qualcomm Institute
• CENIC, PacificWave and StarLight
• DOE ESnet

The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfrastructure

  • 1.
    “The Pacific ResearchPlatform: A Regional-Scale Big Data Analytics Cyberinfrastructure” National Ocean Exploration Forum 2017 Ocean Exploration in a Sea of Data Calit2’s Qualcomm Institute University of California, San Diego October 21, 2017 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD http://lsmarr.calit2.net 1
  • 2.
    Vision: Use Optical Fiberto Connect Big Data Generators and Consumers, Creating a “Big Data” Freeway System “The Bisection Bandwidth of a Cluster Interconnect, but Deployed on a 20-Campus Scale.” This Vision Has Been Building for 15 Years
  • 3.
    The NSF OptIPuterProject: Using Supernetworks to Meet the Needs of Data-Intensive Researchers OptIPortal– Termination Device for the OptIPuter Global Backplane Calit2 (UCSD, UCI), SDSC, and UIC Leads—Larry Smarr PI Univ. Partners: NCSA, USC, SDSU, NW, TA&M, UvA, SARA, KISTI, AIST Industry: IBM, Sun, Telcordia, Chiaro, Calient, Glimmerglass, Lucent 2003-2009 $13,500,000 In August 2003, Jason Leigh and his students used RBUDP to blast data from NCSA to SDSC over the TeraGrid DTFnet, achieving18Gbps file transfer out of the available 20Gbps LS Slide 2005
  • 4.
    We Have BeenWorking Towards Distributed Big Data for 15 Years: NSF OptIPuter, Quartzite, Prism Awards PI Papadopoulos, 2013-2015 PI Smarr, 2002-2009 PI Papadopoulos, 2004-2007
  • 5.
    Based on CommunityInput and on ESnet’s Science DMZ Concept, NSF Has Funded Over 100 Campuses to Build Local Big Data Freeways Red 2012 CC-NIE Awardees Yellow 2013 CC-NIE Awardees Green 2014 CC*IIE Awardees Blue 2015 CC*DNI Awardees Purple Multiple Time Awardees Source: NSF
  • 6.
    Terminating the FiberOptics - Data Transfer Nodes (DTNs): Flash I/O Network Appliances (FIONAs) UCSD Designed FIONAs To Solve the Disk-to-Disk Data Transfer Problem For Big Data at Full Speed on 10G, 40G and 100G Networks FIONAS—10/40G, $8,000 FIONette—1G, $1,000 Phil Papadopoulos, SDSC & Tom DeFanti, Joe Keefe & John Graham, Calit2 John Graham, Calit2
  • 7.
    How UCSD DMZNetwork Transforms Big Data Microbiome Science: Preparing for Knight/Smarr 1 Million Core-Hour Analysis Knight Lab FIONA 10Gbps Gordon Prism@UCSD Data Oasis 7.5PB, 200GB/s Knight 1024 Cluster In SDSC Co-Lo CHERuB 100Gbps Emperor & Other Vis Tools 64Mpixel Data Analysis Wall 120Gbps 40Gbps 1.3Tbps
  • 8.
    (GDC) Logical Next Step:The Pacific Research Platform Creates a Regional End-to-End Science-Driven “Big Data Superhighway” System NSF CC*DNI Grant $5M 10/2015-10/2020 PI: Larry Smarr, UC San Diego; Calit2 Co-PIs: • Camille Crittenden, UC Berkeley CITRIS, • Tom DeFanti, UC San Diego Calit2, • Philip Papadopoulos, UCSD SDSC, • Frank Wuerthwein, UCSD Physics and SDSC Letters of Commitment from: • 50 Researchers from 15 Campuses • 32 IT/Network Organization Leaders
  • 9.
    We Measure Disk-to-DiskThroughput with 10GB File Transfer 4 Times Per Day in Both Directions for All PRP Sites January 29, 2016 From Start of Monitoring 12 DTNs to 24 DTNs Connected at 10-40G in 1 ½ Years July 21, 2017 Source: John Graham, Calit2
  • 10.
    PRP’s First 1.5Years: Connecting Multi-Campus Application Teams and Devices
  • 11.
    Our Prototype System– Built for for Scientists Out of a Bunch of Independently Managed Networks • Challenge: – Campus DMZs, Regional (e.g., CENIC), National (Internet2), International Networks (e.g., GLIF) are Individually-Architected Systems • How Do They Work Together with Predictable Performance? •  PRP is Focused on Disk-to-Disk Data Movement – From the Eyes of Domain Scientists – End-to-End for Their Data is Their Only Real Metric of Concern (As it Should Be) Source: Phil Papadopoulos
  • 12.
    Data Transfer RatesFrom UCSD Physics Building Servers Across Campus and Then To Chicago’s Fermilab Utilizing UCSD Prism Campus Optical Network Source: Frank Wuerthwein, UCSD, SDSC
  • 13.
    Global Scientific InstrumentsWill Produce Ultralarge Datasets Continuously Requiring Dedicated Optic Fiber and Supercomputers Square Kilometer Array https://tnc15.terena.org/getfile/1939 Large Synoptic Survey Telescope 3.2 Gpixel Camera Tracks ~40B Objects, Creates 10M Alerts/Night Within 1 Minute of Observing 2x100Gb/s “First Light” In 2019
  • 14.
    100 Gbps FIONAat UCSC Allows for Downloads to the UCSC Hyades Cluster from the LBNL NERSC Supercomputer for DESI Science Analysis 300 images per night. 100MB per raw image 30GB per night 120GB per night 250 images per night. 530MB per raw image 150 GB per night 800GB per night Source: Peter Nugent, LBNL Professor of Astronomy, UC Berkeley Precursors to LSST and NCSA NSF-Funded Cyberengineer Shaw Dong @UCSC Receiving FIONA Feb 7, 2017
  • 15.
    40G FIONAs 20x40G PRP-connected WAVE@UCSan Diego PRP Now Enables Distributed Virtual Reality PRP WAVE @UC Merced Transferring 5 CAVEcam Images from UCSD to UC Merced: 2 Gigabytes now takes 2 Seconds (8 Gb/sec)
  • 16.
    Five Examples ofEarth Sciences Research Teams That Are in the Early Stages of Using PRP Frank Vernon - Expansion of HPWREN Dan Cayan, Mike Dettinger Regional Downscaling of Climate Models Scott Sellars, Marty Ralph Center for Western Weather and Water Extremes John Delaney- Undersea Cabled Observatory Jules Jaffe, Ocean Microscope
  • 17.
    Dan Cayan USGS WaterResources Discipline Scripps Institution of Oceanography, UC San Diego much support from Mary Tyree, Mike Dettinger, Guido Franco and other colleagues Sponsors: California Energy Commission NOAA RISA program California DWR, DOE, NSF Planning for climate change in California substantial shifts on top of already high climate variability SIO Campus Climate Researchers Need to Download Results from NCAR Remote Supercomputer Simulations to Make Regional Climate Change Forecasts NCAR Upgrading to 10Gbps Link from Wyoming and Boulder to CENIC/PRP
  • 18.
    Improving the Disk-to-DiskBandwidth From ORNL and NCAR/Wyoming to the PRP and on to SIO Calit2 to FIONA in SIO Co-Lo is 40Gbps NCAR Installed a 100G WAN link Between the NCAR Wyoming Supercomputer Center (NWSC) and the Front Range GigaPoP (FRGP), Which is Linked to CENIC/PRP and UC San Diego PRP Using Globus Transfer of a Year of Monthly Output from the 0.1-Degree Ocean General Circulation Model Run at ORNL/OLCF to the UCSD FIONA, Resulted in a 8x Speedup in Transfer Time
  • 19.
    The High PerformanceWireless Research and Education Network (HPWREN) Supports Earth and Environmental Sciences in SoCal https://hpwren.ucsd.edu/ Source: Frank Vernon, Hans Werner Braun HPWREN
  • 20.
    HPWREN Provides WirelessBackhaul for Seismic Stations on SoCal Fault Lines Source: Frank Vernon, Hans Werner Braun HPWREN
  • 21.
    HPWREN Real-Time NetworkCameras on Mountains for Environmental Observations and Fires San Diego County Red Mountain Fire Cameras • Southeast (left) “Highway” Fire • Southwest (center rear) “Poinsettia” Fire • West (right) “Tomahawk” Fire Source: Frank Vernon, Hans Werner Braun HPWREN May 14, 2014
  • 22.
    PRP Backbone SetsStage for 2017 Expansion of HPWREN, Connected to CENIC, into Orange and Riverside Counties • PRP CENIC 100G Link UCSD to SDSU – DTN FIONAs Endpoints – Data Redundancy – Disaster Recovery – High Availability – Network Redundancy • CENIC Enables PRP 10G Links Between UCSD, SDSU, & UCI HPWREN Servers • Potential Future UCR CENIC Anchor UCR UCI UCSD SDSU Source: Frank Vernon, Hans Werner Braun HPWREN UCI Antenna Dedicated June 27, 2017
  • 23.
    Pacific City Neptune Canada 45°N 47°30’N 130°W 127°30’W N Seattle GigaPOP Portland Possible PRP2017 Expansion to Include NSF’s Ocean Observatory Initiative Fiber Optic SensorNets on Seafloor Off Washington To PRP via Pacific Wave Sea Bottom Electro-optical Cable: 8,000 Volts 10 Gbps Optics Slide Courtesy, John Delaney, UWash Axial Volcano 140 Scientific Instruments John Delaney Visiting UCSD’s SIO For Three Months in 2017
  • 24.
    Being There -Remote Live High Definition Video of Deep Sea Hydrothermal Vents http://novae.ocean.washington.edu/story/Ashes_CAMHD_Liv Mushroom Hydrothermal Vent on Axial Seamount 1 Mile Below Sea Level Picture Created From 40 HD Frames 14 Minutes Live HD Video On-Line Every 3 Hours 15 feet Slide Courtesy, John Delaney, UWash
  • 25.
    Director: F. MartinRalph Website: cw3e.ucsd.edu Big Data Collaboration with: Source: Scott Sellers, CW3E Collaboration on Atmospheric Water in the West Between UC San Diego and UC Irvine Director, Soroosh Sorooshian, UCSD Website http://chrs.web.uci.edu
  • 26.
    Calit2’s FIONA SDSC’s COMET Calit2’sFIONA Pacific Research Platform (10-100 Gb/s) GPUsGPUs Complete workflow time: 20 days20 hrs20 Minutes! UC, Irvine UC, San Diego Major Speedup in Scientific Work Flow Using the PRP Source: Scott Sellers, CW3E
  • 27.
    • CONNected objECT(CONNECT) Algorithm, developed at UCI-CHRS – Team: Wei Chu, Scott Sellars, Phu Nguyen, Xiaogang Gao, Kuo-lin Hsu, and Soroosh Sorooshian – Most algorithms do not track the events over it’s life cycle t=1 t=2 t=3 t=4 t=5 Data Hypercube: Longitude Time 1. Each voxel must have 1mm/hr 2. Each object must exist for 24 hours 3. 6 voxel connections Set Object Criteria: CONNECT: Object Segmentation Object Storage (PostgreSQL) 1. Object ID Number 2. Latitude (of each voxel in objects) 3. Longitude (of each voxel in objects) 4. Time (hour) Database Indexes: 5mm/hr Rainfall 1. 60N-60S, 0-360 lat and long 2. Hourly time step 3. March 1st, 2000 to January 1st, 2011 Data Convert Global Precipitation Maps to a Database of Precipitation Spacetime Objects Source: (Sellars et al. 2013, 2015)
  • 28.
    Using Machine Learningto Determine the Precipitation Object Starting Locations *Sellars et al., 2017 (in prep)
  • 29.
    Jaffe Lab (SIO)Scripps Plankton Camera Off the SIO Pier with Fiber Optic Network
  • 30.
    Over 300 MillionImages So Far! Capturing Vast Microscopic Biodiversity in the Oceans Phytoplankton: Diatoms Zooplankton: Copepods Zooplankton: Larvaceans Source: Jules Jaffe, SIO ”We are using the FIONAs for image processing... this includes doing Particle Tracking Velocimetry that is very computationally intense.”-Jules Jaffe
  • 31.
    Requires Machine Learning forAutomated Image Analysis and Classification CNN vs Random Forest (SVM)
  • 32.
    New NSF CHASE-CIGrant Creates a Community Cyberinfrastructure Adding a Machine Learning Layer Built on Top of the Pacific Research Platform Caltech UCB UCI UCR UCSD UCSC Stanford MSU UCM SDSU NSF Grant for High Speed “Cloud” of 256 GPUs For 30 ML Faculty & Their Students at 10 Campuses for Training AI Algorithms on Big Data
  • 33.
    Machine Learning Researchers Needa New Cyberinfrastructure “Until cloud providers are willing to find a solution to place commodity (32-bit) game GPUs into their servers and price services accordingly, I think we will not be able to leverage the cloud effectively.” “There is an actual scientific infrastructure need here, surprisingly unmet by the commercial market, and perhaps CHASE-CI is the perfect catalyst to break this logjam.” --UC Berkeley Professor Trevor Darrell
  • 34.
    FIONAs and FIONettes– Flash I/O Network Appliances Solve the PRP Disk-to-Disk Data Transfer Problem Multi-Tenant Containerized GPU JupyterHub Running Kubernetes / CoreOS Eight Nvidia GTX-1080 Ti GPUs ~$13K 32GB RAM, 3TB SSD, 40G & Dual 10G ports Source: John Graham, Calit2 Now Adding GPUs to Support Data Science Machine Learning
  • 35.
    Single vs. DoublePrecision GPUs: Gaming vs. Supercomputing 8 x 1080 Ti: 1 Million GPU Core-Hours Every 2 Days, Cost of a Starbucks Latte. 500 Million GPU Core-Hours for $14K in 3yrs
  • 36.
    48 GPUs for OSGApplications UCSD Adding >350 Game GPUs to Data Sciences Cyberinfrastructure - Devoted to Data Analytics and Machine Learning SunCAVE 70 GPUs WAVE + Vroom 48 GPUs FIONA with 8-Game GPUs 88 GPUs for Students CHASE-CI Grant Provides 96 GPUs at UCSD for Training AI Algorithms on Big Data
  • 37.
    The Future ofSupercomputing Will Blend Traditional HPC and Data Analytics Integrating Non-von Neumann Architectures “High Performance Computing Will Evolve Towards a Hybrid Model, Integrating Emerging Non-von Neumann Architectures, with Huge Potential in Pattern Recognition, Streaming Data Analysis, and Unpredictable New Applications.” Horst Simon, Deputy Director, U.S. Department of Energy’s Lawrence Berkeley National Laboratory
  • 38.
    Calit2’s Qualcomm InstituteHas Established a Pattern Recognition Lab For Machine Learning on GPUs and von Neumann and NvN Processors Source: Dr. Dharmendra Modha Founding Director, IBM Cognitive Computing Group August 8, 2014 UCSD ECE Professor Ken Kreutz-Delgado Brings the IBM TrueNorth Chip to Start Calit2’s Qualcomm Institute Pattern Recognition Laboratory September 16, 2015
  • 39.
    Next Step: Surroundingthe UCSD Data Sciences Machine Learning Platform With Clouds of GPUs and Non-Von Neumann Processors Microsoft Installs Altera FPGAs into Bing Servers & 384 into TACC for Academic Access 64-TrueNorth Cluster CHASE-CI64-bit GPUs
  • 40.
    Our Support: • USNational Science Foundation (NSF) awards  CNS 0821155, CNS-1338192, CNS-1456638, CNS-1730158, ACI-1540112, & ACI-1541349 • University of California Office of the President CIO • UCSD Chancellor’s Integrated Digital Infrastructure Program • UCSD Next Generation Networking initiative • Calit2 and Calit2 Qualcomm Institute • CENIC, PacificWave and StarLight • DOE ESnet