Grand Challenge Lecture
Big Data and the Earth Sciences: Grand Challenges Workshop
Calit2’s Qualcomm Institute
University of California, San Diego
May 31, 2017
Using the Pacific Research Platform for Earth Sciences Big Data
1. “Using the Pacific Research Platform
for Earth Sciences Big Data”
Grand Challenge Lecture
Big Data and the Earth Sciences: Grand Challenges Workshop
Calit2’s Qualcomm Institute
University of California, San Diego
May 31, 2017
Dr. Larry Smarr
Director, California Institute for Telecommunications and Information Technology
Harry E. Gruber Professor,
Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSD
http://lsmarr.calit2.net
1
2. Vision: Integrated 10Gbps Lightpath
Cyberinfrastructure System for Big Data Sciences
National LambdaRail
Campus
Optical
Switch
Data Repositories & Clusters
HPC
HD/4k Video Images
HD/4k Video Cams
End User
OptIPortal
10G
Lightpath
HD/4k Telepresence
Instruments
LS 2009
Slide
3. DOE ESnet’s Science DMZ: A Scalable Network
Design Model for Optimizing Science Data Transfers
• A Science DMZ integrates 4 key concepts into a unified whole:
– A network architecture designed for high-performance applications,
with the science network distinct from the general-purpose network
– The use of dedicated systems as data transfer nodes (DTNs)
– Performance measurement and network testing systems that are
regularly used to characterize and troubleshoot the network
– Security policies and enforcement mechanisms that are tailored for
high performance science environments
http://fasterdata.es.net/science-dmz/
Science DMZ
Coined 2010
The DOE ESnet Science DMZ and the NSF “Campus Bridging” Taskforce Report Formed the Basis
for the NSF Campus Cyberinfrastructure Network Infrastructure and Engineering (CC-NIE) Program
4. Based on Community Input and on ESnet’s Science DMZ Concept,
NSF Has Funded Over 100 Campuses to Build Local Big Data Freeways
Red 2012 CC-NIE Awardees
Yellow 2013 CC-NIE Awardees
Green 2014 CC*IIE Awardees
Blue 2015 CC*DNI Awardees
Purple Multiple Time Awardees
Source: NSF
5. Big Data Science Data Transfer Nodes -
Flash I/O Network Appliances (FIONAs)
UCSD Designed FIONAs
To Solve the Disk-to-Disk
Data Transfer Problem
at Full Speed
on 10G, 40G and 100G Networks
FIONAS—10/40G, $8,000
FIONette—1G, $1,000
Phil Papadopoulos, SDSC &
Tom DeFanti, Joe Keefe & John Graham, Calit2
John Graham, Calit2
6. How UCSD DMZ Network Transforms Big Data Microbiome Science:
Preparing for Knight/Smarr 1 Million Core-Hour Analysis
Knight Lab
FIONA
10Gbps
Gordon
Prism@UCSD
Data Oasis
7.5PB,
200GB/s
Knight 1024 Cluster
In SDSC Co-Lo
CHERuB
100Gbps
Emperor & Other Vis Tools
64Mpixel Data Analysis Wall
120Gbps
40Gbps
1.3Tbps
7. Logical Next Step: The Pacific Research Platform Creates
a Regional End-to-End Science-Driven “Big Data Superhighway” System
NSF Grant
$5M 10/2015-10/2020
PI: Larry Smarr, UC San Diego Calit2
Letters of Commitment from:
• 50 Researchers from 15 Campuses
• 32 IT/Network Organization Leaders
8. PRP Continues to Expand Rapidly While Increasing Connectivity:
1 1/2 Years of Progress – 12 Sites to 24 Sites
January 29, 2016
Connected 24 DMZ FIONAs
at 10G and 40G
April 24, 2017
Source: John Graham, Calit2
9. PRP Allows for Multiple Secure Independent
Cooperating Research Groups
• Any Particular Science Driver is Comprised of
Scientists and Resources at a Subset of
Campuses and Resource Centers
• We Term These Science Teams with
the Resources and Instruments they Access as
Cooperating Research Groups (CRGs).
• Members of a Specific CRG Trust One Another,
But They Do Not Necessarily Trust Other CRGs
10. PRP’s First 1.5 Years:
Connecting Multi-Campus Application Teams and Devices
11. PRP Will Link the Laboratories of
the Pacific Earthquake Engineering Research Center
http://peer.berkeley.edu/
PEER Labs: UC Berkeley, Caltech, Stanford,
UC Davis, UC San Diego, and UC Los Angeles
John Graham Installing FIONette at PEER Feb 10, 2017
12. PRP Allows Researchers to Bring Datasets from NERSC
to Their Local Clusters for In-Depth Science Analysis
100 Gbps FIONA at UCSC Connects
the UCSC Hyades Cluster
to the NERSC Supercomputer at LBNL
for the Dark Energy Spectroscopic Instrument (DESI)
and AGORA Galaxy Simulation Data
Produced at NERSC.
250 images per night
800GB per night
UCSC Feb 7, 2017
13. 40G FIONAs
20x40G PRP-connected
WAVE@UC San Diego
PRP Now Enables
Distributed Virtual Reality
PRP
WAVE @UC Merced
Transferring 5 CAVEcam Images from UCSD to UC Merced:
2 Gigabytes now takes 2 Seconds (8 Gb/sec)
14. Increasing Participation Through
PRP Science Engagement Workshops
Source: Camille Crittenden, UC Berkeley
UC San Diego
UC Merced
UC Davis UC Berkeley
15. Four Examples of Earth Sciences Research Teams
That Are in the Early Stages of Using PRP
John Delaney-
Undersea Cabled Observatory
Frank Vernon - Expansion of HPWREN
Dan Cayan, Mike Dettinger
Regional Downscaling of Climate Models
Scott Sellars, Marty Ralph
Center for Western Weather and Water Extremes
16. Director: F. Martin Ralph Website: cw3e.ucsd.edu
Big Data Collaboration with:
Source: Scott Sellers, CW3E
Collaboration on Atmospheric Water in the West
Between UC San Diego and UC Irvine
Director, Soroosh Sorooshian, UCSD Website http://chrs.web.uci.edu
17. • CONNected objECT (CONNECT) Algorithm, developed at UCI-CHRS
– Team: Wei Chu, Scott Sellars, Phu Nguyen, Xiaogang Gao, Kuo-lin Hsu, and Soroosh Sorooshian
– Most algorithms do not track the events over it’s life cycle
t=1
t=2
t=3
t=4
t=5
Data Hypercube:
Longitude
Time
1. Each voxel must have 1mm/hr
2. Each object must exist for 24
hours
3. 6 voxel connections
Set Object Criteria:
CONNECT: Object Segmentation Object Storage
(PostgreSQL)
1. Object ID Number
2. Latitude (of each voxel in
objects)
3. Longitude (of each voxel in
objects)
4. Time (hour)
Database Indexes:
5mm/hr
Rainfall
1. 60N-60S, 0-360 lat and long
2. Hourly time step
3. March 1st, 2000 to January 1st,
2011
Data
Convert Global Precipitation Maps to
a Database of Precipitation Spacetime Objects
Source: (Sellars et al. 2013, 2015)
18. Using Machine Learning to Determine
the Precipitation Object Starting Locations
*Sellars et al., 2017 (in prep)
19. Calit2’s FIONA
SDSC’s COMET
Calit2’s FIONA
Pacific Research Platform (10-100 Gb/s)
GPUsGPUs
Complete workflow time: 20 days20 hrs20 Minutes!
UC, Irvine UC, San Diego
Improvement of Over 1000x With PRP
20. Dan Cayan
USGS Water Resources Discipline
Scripps Institution of Oceanography, UC San Diego
much support from Mary Tyree, Mike Dettinger, Guido Franco and other colleagues
Sponsors:
California Energy Commission
NOAA RISA program
California DWR, DOE, NSF
Planning for climate change in California
substantial shifts on top of already high climate variability
SIO Campus Climate Researchers Need to Download
Results from NCAR Remote Supercomputer Simulations
to Make Regional Climate Change Forecasts
NCAR Upgrading to 10Gbps Link from Wyoming and Boulder to CENIC/PRP
21. average summer
afternoon temperature
average summer
afternoon temperature
Downscaling Supercomputer Climate Simulations
To Provide High Res Predictions for California Over Next 50 Years
21
Source: Hugo Hidalgo, Tapash Das, Mike Dettinger
22. We Scale the Working PRP by Providing Multi-Campus Application Teams
With Disk-to-Disk Measurements
UIC
UCSD
UCI
U Hawaii
USC
NCAR
SDSU
23. The High Performance Wireless Research and Education Network (HPWREN)
Supports Earth and Environmental Sciences in SoCal
https://hpwren.ucsd.edu/
Source: Frank Vernon,
Hans Werner Braun HPWREN
24. HPWREN Provides Wireless Backhaul for Seismic Stations
on SoCal Fault Lines
Source: Frank Vernon,
Hans Werner Braun HPWREN
25. HPWREN Real-Time Network Cameras on Mountains
for Environmental Observations and Fires
Source: Frank Vernon,
Hans Werner Braun HPWREN
26. Why? 14 May 2014:
9 Simultaneous Active Fires in San Diego County
San Diego County Red Mountain Fire Cameras
• Southeast (left) “Highway” Fire
• Southwest (center rear) “Poinsettia” Fire
• West (right) “Tomahawk” Fire
Source: Frank Vernon,
Hans Werner Braun HPWREN
27. PRP Backbone Sets Stage for 2017 Expansion
of HPWREN, Connected to CENIC, into Orange and Riverside Counties
• PRP CENIC 100G Link
UCSD to SDSU
– DTN FIONAs Endpoints
– Data Redundancy
– Disaster Recovery
– High Availability
– Network Redundancy
• Anchor to CENIC at UCI
– PRP FIONA Connects to
CalREN-HPR Network
– Data Replication Site
• Potential Future UCR
CENIC Anchor
UCR
UCI
UCSD
SDSU
Source: Frank Vernon,
Hans Werner Braun HPWREN
29. Slide from John Delaney
Univ. of Washington
John Delaney Visiting UCSD’s SIO
For Three Months in 2017
30. Being There - Remote Live High Definition Video
of Deep Sea Hydrothermal Vents
http://novae.ocean.washington.edu/story/Ashes_CAMHD_Liv
Mushroom Hydrothermal Vent
on Axial Seamount
1 Mile Below Sea Level
Picture Created
From 40 HD Frames
14 Minutes Live HD Video
On-Line Every 3 Hours
15 feet
Slide Courtesy, John Delaney, UWash
31. The Future of Supercomputing Will Blend Traditional HPC and Data Analytics
Integrating Non-von Neumann Architectures
“High Performance Computing Will Evolve
Towards a Hybrid Model,
Integrating Emerging Non-von Neumann Architectures,
with Huge Potential in Pattern Recognition,
Streaming Data Analysis,
and Unpredictable New Applications.”
Horst Simon, Deputy Director,
U.S. Department of Energy’s
Lawrence Berkeley National Laboratory
32. Brain-Inspired Processors
Are Accelerating the Non-von Neumann Architecture Era
“On the drawing board are collections of 64, 256, 1024, and 4096 chips.
‘It’s only limited by money, not imagination,’ Modha says.”
Source: Dr. Dharmendra Modha
Founding Director, IBM Cognitive Computing Group
August 8, 2014
33. Calit2’s Qualcomm Institute Has Established a Pattern Recognition Lab
For Machine Learning on non-von Neumann Processors
Source: Dr. Dharmendra Modha
Founding Director, IBM Cognitive Computing Group
August 8, 2014
UCSD ECE Professor Ken Kreutz-Delgado Brings
the IBM TrueNorth Chip
to Start Calit2’s Qualcomm Institute
Pattern Recognition Laboratory
September 16, 2015
34. Plans for ~500 Game GPUs Deployed on the Pacific Research Platform
Devoted to Machine Learning
Caltech
UCB
UCI UCR
UCSD
UCSC
Stanford
MSU
UCM
SDSU
High Speed “Cloud” of 320 GPUs
for Training AI Algorithms on Big Data
SunCAVE 70 GPUs
48 GPUs
for Applications
48 GPUs
for Students
FIONA with
8-Game GPUs
35. Announcing the First National Research Platform
Workshop August 7-8, 2017
Co-Chairs:
Larry Smarr, Calit2
& Jim Bottum, Internet2
See pacificresearchplatform.org
for Registration Information
36. Expanding to the Global Research Platform
Via CENIC/Pacific Wave, Internet2, and International Links
PRP’s Current
International
Partners
Korea Shows Distance is Not the Barrier
to Above 5Gb/s Disk-to-Disk Performance
37. Our Support:
• US National Science Foundation (NSF) awards CNS 0821155 and
CNS-1338192, CNS-1456638, ACI-1540112, and ACI-1541349
• University of California Office of the President CIO
• UCSD Chancellor’s Integrated Digital Infrastructure Program
• UCSD Next Generation Networking initiative
• Calit2 and Calit2 Qualcomm Institute
• CENIC, PacificWave and StarLight
• DOE ESnet