Pioneering and Democratizing Scalable HPC+AI at PSC

1 © 2019 Pittsburgh Supercomputing Center
Pioneering and Democratizing Scalable HPC+AI
© 2019 Pittsburgh Supercomputing Center
Nick Nystrom
Interim Director, PSC
nystrom@psc.edu
Paola Buitrago
Director, AI & Big Data, PSC
paola@psc.edu
2019 Stanford Conference · Stanford · February 15, 2019

Outline
Motivation & Vision
Realizing the Vision: Bridges and Bridges-AI
Exemplars of Success
Summary
2
Outline
Motivation & Vision
Summary

3
What is PSC?
Advise and support industry
• Training, access to advanced resources,
collaborative research
Education and training
• Lead national & local workshops
• Support courses at CMU and elsewhere
• Teaching, thesis committees, interns
Active member in the CMU and Pitt communities
• Research collaborations
• Colocation for lower cost and greater capability
PSC is a joint effort of
Carnegie Mellon University
and the University of Pittsburgh.
33 years of leadership in HPC,
HPDA, and computational science.
21 HPC systems, 10 of which
were the first or unique.
Pioneering the convergence
of AI + HPC + data.
Research institution advancing knowledge
through converged HPC, AI, and Big Data
• ~30 active funded projects
Networking and security
• Networking & security service provider
• Research networking
National service provider for research and discovery
• Bridges, Anton 2, Brain Image Library,
Open Compass, XSEDE, Olympus
Bridges Anton 2
Brain Image
Library

4
Research Needs Converged HPC, AI, and Data
Pan-STARRS telescope
http://pan-starrs.ifa.hawaii.edu/public/
Genome sequencers
(Wikipedia Commons)
Collections
Horniman museum: http://www.horniman.ac.uk/
get_involved/blog/bioblitz-insects-reviewed
Legacy documents
Wikipedia Commons
Environmental sensors: Water temperature profiles
from tagged hooded seals
http://www.arctic.noaa.gov/report11/biodiv_whales_walrus.h
tml
Library of Congress stacks
https://www.flickr.com/photos/danlem2001/69221130
91/
Video
Wikipedia Commons
Social networks and the Internet Wearable Sensors
F. De Roose et al.,
https://techxplore.com/news/2016-12-
smart-contact-lens-discussed-
electron.html
Detecting Cancer
https://research.googleblog.c
om/2017/03/assisting-
pathologists-in-
detecting.html
Structured, regular,
homogeneous
Unstructured, irregular, heterogeneous
The Human BioMolecular Atlas Program
https://commonfund.nih.gov/hubmap
BlueTides astrophysics simulation
http://bluetides-project.org/

5
Enabling the Creation of Knowledge
Common Goal
Enable the creation of knowledge
• Democratize HPC, Big Data, and AI
• Enable research areas that have not
previously used HPC
• Advance previously traditional fields
through machine learning and data
analytics
• Couple applications in novel ways
Objectives
Enable data-intensive applications & workflows
• Deliver HPC Software as a Service
(Science Gateways)
• Deliver Big Data as a Service (BDaaS)
• Provide scalable deep learning, machine learning, and
graph analytics
• Support very large in-memory databases
• Facilitate data assimilation from instruments and the
Internet
Scale beyond the laptop and to interdisciplinary,
collaborative teams

6
The Rapid Growth of AI
From: Artificial Intelligence Index: 2018 Annual Report (Stanford University, 2018)

Outline
Motivation & Vision
Summary
7
Outline
Motivation & Vision
Summary

Bridges converges HPC, AI, and Big Data to empower new research communities, bring desktop convenience
to advanced computing, expand remote access, and help researchers to work more intuitively.
• Funded by NSF award #OAC-1445606 ($20.9M), Bridges emphasizes usability, flexibility, and interactivity
• Available at no charge for open research and coursework and by arrangement to industry
• Popular programming languages and applications: Python, Jupyter, R, MATLAB, Java, Spark, Hadoop, …
• 856 compute nodes containing Intel Xeon CPUs and 128GB (800), 3TB (42), and 12TB (4) of RAM each
• 216 NVIDIA Tesla GPUs: 64 K80, 64 P100, (new) 88 V100 configured to balance capability & capacity
• Dedicated nodes for persistent databases, gateways, and distributed services
• The world’s first deployment of the Intel Omni-Path Architecture fabric
8
• Available at no cost for open research and courses
and by arrangement to industry
• Easier access for CMU and Pitt faculty through
the Pittsburgh Research Computing Initiative
• 29,036 Intel Xeon CPU cores
• 216 NVIDIA GPUs: 64 K80, 64 P100, 88 V100
• 17PB storage (10PB persistent, 7.3PB local)
• 277TB memory (RAM), up to 12TB per node
• 44M core-hours, 173k GPU-AI-hours,
442k GPU-hours, and 343k TB-hours allocated
quarterly
• Serving ~1,850 projects and ~7500 users at
393 institutions, spanning 119 fields of study
• Bridges-AI: NVIDIA DGX-2 Enterprise AI system
+ 9 HPE 8-Volta Apollo 6500 Gen10 servers:
total of 88 V100 GPUs

delivered Bridges, and is now
delivering Bridges GPU-AI
All trademarks, service marks, trade names, trade dress, product names, and logos appearing herein are the property of their respective owners.
Acquisition and operation of Bridges are made
possible by the National Science Foundation
through award #OAC-1445606 ($20.9M):
Bridges:From Communities and Data to
Workflows and Insight
9

10
Bridges Makes Advanced Computing Easy
Elements not available in traditional
supercomputers
10
Make HPC accessible to all research communities
Converge HPC, AI, and Big Data
Support the widest range of science with an extremely rich
computing environment
• 3 tiers of memory: 12 TB, 3 TB, and 128 GB
• Powerful, flexible CPUs and GPUs
• Familiar, easy-to-use user environment:
– Interactivity
– Popular languages and frameworks:
Python, Anaconda, R, MATLAB, Java, Spark, Hadoop
– AI frameworks: TensorFlow, Caffe2, PyTorch, etc.
– Containers (e.g., NGC) and virtual machines (VMs)
– Databases
– Gateways and distributed (web) services
– Large collection of applications and libraries

11
Conceptual Architecture
Intel Omni-Path
Architecture
fabric
Management
nodes
Parallel File
System
Web Server
nodes
Database
nodes
Data Transfer
nodes
Login
nodes
Users,
XSEDE,
campuses,
instruments
ESM Nodes
12TB RAM
4 nodes
LSM Nodes
3TB RAM
42 nodes
RSM Nodes
128GB RAM
800 nodes,
48 with GPUs
Bridges-AI
NVIDIA DGX-2
(16 V100 GPUs)
9x HPE A6500
(9x 8 V100 GPUs)
Introduced in
Operations Year 3

12
16 RSM nodes, each with 2 NVIDIA Tesla K80 GPUs
32 RSM nodes, each with
2 NVIDIA Tesla P100 GPUs
748 HPE Apollo 2000 (128GB)
compute nodes
20 “leaf” Intel® OPA edge switches
6 “core” Intel® OPA edge switches:
fully interconnected,
2 links per switch
42 HPE ProLiant DL580 (3
TB) compute nodes
20 Storage Building Blocks,
implementing the parallel Pylon
storage system (10 PB usable)
4 HPE Integrity
Superdome X (12TB)
compute nodes …
12 HPE ProLiant DL380
database nodes
6 HPE ProLiant DL360
web server nodes
4 MDS nodes
2 front-end nodes
2 boot nodes
8 management nodes
Intel® OPA cables
… each with 2 gateway nodes
Purpose-built Intel® Omni-Path
Architecture topology for
data-intensive HPC
16 HPE Apollo 2000 (128GB) GPU nodes
with 2 NVIDIA Tesla K80 GPUs each
32 HPE Apollo 2000 (128GB) GPU nodes
with 2 NVIDIA Tesla P100 GPUs each
Simulation (including AI-enabled)
ML, inferencing, DL development,
Spark, HPC AI (Libratus)
Distributed training, Spark, etc.
Representative
uses for AI
Robust paths to
parallel storage
Project &
community
datasets
Large-
memory
Java &
Python
User interfaces for
AIaaS, BDaaS
https://psc.edu/bvt
Bridges Virtual Tour:
Maximum-Scale Deep Learning
NVIDIA DGX-2 and
9 HPE Apollo 6500
Gen10 nodes:
88 NVIDIA Tesla
V100 GPUs
Deep
Learning
Bridges-AI
12

Open Research Industry
PSC Corporate Program
Startup Research Education
Cost No charge No charge No charge Cost recovery rates
CPU-hours 50k Up to ~107 Up to ~106 Up to ~18M
GPU-hours 2500 Up to ~105 Up to ~104 Up to ~180k
GPU-AI hours 1500 Up to ~105 Up to ~104 Up to ~69k
TB-hours 1000 Up to ~104 Up to ~104 Up to ~137k
Developer Yes Yes (Yes) Yes
Accepted Any time Quarterly Any time Any time
Awarded ~1-2 days Quarterly ~1-3 days ASAP
13
Accessing Bridges: No Cost for Research & Education and
Cost-Recovery Rates for Corporate Use
The following annual allocations are renewable and extendable, also at no cost for research and education.

Interactivity is the feature most frequently
requested by nontraditional HPC communities.
– Interactivity provides immediate
feedback for doing exploratory
data analytics and testing hypotheses.
– Bridges offers interactivity through a combination of shared,
dedicated, and persistent resources to maximize availability while
accommodating diverse needs.
14
Interactivity

15
High-Productivity Programming
Supporting languages that communities already use is vital for them to apply
HPC to their research questions. This applies to both traditional and
nontraditional HPC communities.

Gateways provide easy-to-use access to Bridges’ HPC and data resources, allowing users to
launch jobs, orchestrate complex workflows, and manage data from their browsers.
– Provide “HPC Software-as-a-Service”
– Extensive use of VMs, databases, and distributed services
16
Gateways and Tools for Building Them
Galaxy (PSU, Johns Hopkins)
https://galaxyproject.org/
The Causal Web (Pitt, CMU)
http://www.ccd.pitt.edu/tools/
Neuroscience Gateway (SDSC)

Dedicated database nodes power persistent relational and
NoSQL databases
– Support data management and data-driven workflows
– SSDs for high IOPs; HDDs for high capacity
Dedicated web server nodes
– Enable distributed, service-oriented architectures
– High-bandwidth connections to XSEDE and the Internet
17
Databases and Distributed/Web Services
(examples
)

• 1 NVIDIA DGX-2
Tightly couples 16 NVIDIA Tesla V100 (Volta) GPUs
at 2.4TB/s bisection bandwidth, to provide maximum
capability for the most demanding of AI challenges
• 9 Hewlett Packard Enterprise Apollo 6500 Gen10 servers
Each with 8 NVIDIA Tesla V100 GPUs connected by
NVLink 2.0, to balance great AI capability and capacity
• Bridges-AI is integrated with Bridges and allocated through
XSEDE as resource “Bridges GPU-AI”, analogous to Bridges GPU, RM, LM, and Pylon
• Bridges-AI adds 9.9 Pf/s of mixed-precision tensor, 1.24Pf/s of fp32, and 0.62Pf/s of fp64. (Totals:
9.9Pf/s tensor, 3.93 Pf/s fp32, 1.97 Pf/s fp64).
• The $1.786M supplement includes additional staffing to support solutions and scaling
• Deployment: Bridges-AI deployed on time. PSC ran an Early User Program from November-
December 2018, and production operations began January 1, 2019.
18
Bridges-AI: Overview
Volta introduces Tensor Cores to
accelerate neural networks, yielding
extremely high peak performance for
appropriate applications.
Bridges-AI providea massive
aggregate performance:
• 9.9Pf/s mixed-precision tensor
• 251Tf/s 32-bit
• 125Tf/s 64-bit

New Streaming Multiprocessor (SM) architecture, introducing
Tensor Cores, independent thread scheduling, combined L1 data cache and shared
memory unit, and 50% higher energy efficiency over Pascal.
Tensor Cores accelerate deep learning training and inference, providing up to 12× and
6× higher peak flops respectively over the P100 GPUs currently available in XSEDE.
NVLink 2.0 delivering 300 GB/s total bandwidth per GV100, nearly 2× higher than
P100.
HBM2 bandwidth and capacity increases: 900 GB/s and up to 32GB.
Enhanced Unified Memory and Address Translation Services improve accuracy of
memory page migration by providing new access counters.
Cooperative Groups and New Cooperative Launch APIs expand the programming
model to allow organizing groups of communicating threads.
Volta-Optimized Software includes new versions of frameworks and libraries optimized
to take advantage of the Volta architecture: TensorFlow, Caffe2, MXNet, CNTK,
cuDNN, cuBLAS, TensorRT, etc.
19
The Heart of Bridges-AI: NVIDIA Volta
NVIDIA Tesla V100 SXM2 Module
with Volta GV100 GPU
Training ResNet-50 with ImageNet:
V100 : 1075 images/sa
P100 : 219 images/sb
K80 : 52 images/sb
a. https://devblogs.nvidia.com/tensor-core-ai-performance-milestones/
b. https://www.tensorflow.org/performance/benchmarks

Bridges-AI adds 9 HPE Apollo 6500 Gen10 servers
Each HPE Apollo 6500 couples 8 NVIDIA Tesla V100 SXM2 GPUs
– 40,960 CUDA cores and 5,120 tensor cores
Performance: 1Pf/s mixed-precision tensor, 125Tf/s 32b, 64Tf/s 64b
Memory: 128GB HBM2, 7.2TB/s aggregate memory bandwidth
2×Intel Xeon Gold 6148 CPUs and 192GB of DDR4-2666 RAM
– 20c, 2.4–3.7GHz, 27.5MB L3, 3 UPI links
4×2TB NVMe SSDs for user and system data
1×Intel Omni-Path host channel adapter
Hybrid cube-mesh topology connecting the 8 V100 GPUs and 2 Xeon
CPUs, using NVLink 2.0 between the GPUs and PCIe3 to the CPUs
20
Balancing AI Capability & Capacity: HPE Apollo 6500
HPE Apollo 6500 Gen10
hybrid cube-mesh topology
HPE Apollo 6500 Gen10 Server

Couples 16 NVIDIA Tesla V100 SXM2 GPUs
– 81,920 CUDA cores and 10,240 tensor cores
Performance: 2Pf/s mixed-precision tensor, 251Tf/s 32b, 125Tf/s 64b
Memory: 512GB HBM2, 14.4TB/s aggregate memory bandwidth
2×Intel Xeon Platinum 8168 CPUs and 1.5TB of DDR4-2666 RAM
– 24c, 2.7–3.7GHz, 33 MB L3, 3 UPI links
2×960GB NVMe SSDs host the Ubuntu Linux OS
8×3.84 TB NVMe SSDs (aggregate ~30 TB)
8×Mellanox ConnectX adapters for EDR InfiniBand & 100 Gb/s Ethernet
The NVSwitch tightly couples the 16 V100 GPUs for capability & scaling
– Each of the 12 NVSwitch chips is an 18×18-port, fully-connected crossbar
– 50 GB/s/port and 900 GB/s/chip bidirectional bandwidths
– 2.4TB/s system bisection bandwidth
21
Maximum DL Capability: NVIDIA DGX-2
NVIDIA DGX-2
NVIDIA DGX-2 with NVSwitch
internal topology

22
Deep Learning Frameworks on Bridges

Containers enable reproducible, cloud-interoperable workflows
and simplify deployment of applications and frameworks
– PSC is a key partner of the Critical Assessment of Metagenome Interpretation (CAMI)
project for reproducible evaluation of metagenomics tools
– CAMI and the DOE Joint Genome Institute defined the biobioxes standard for Docker
containers encapsulating bioinformatics tools
Docker images can be converted to Singularity images and run on Bridges
– Certain vetted Docker containers are also supported
23
Containers
Interoperability
with clouds and
other resources

24
Community Datasets
• Hosting mature corpus of data and data tools for
an open science community
– Accessible by multiple users, multiple groups.
– Provision of reusable data management tools
– Facilitate collaboration
– Offload data management
• Interoperable with HPC capabilities
– High speed data transfer
– High performance compute capabilities
• Support copies, maintenance, guarantee integrity
• Data resource not subject to project
limitations
Some unique, others with
local caching for
efficiency and to drive
interdisciplinary
research

25
The Expanding Ecosystem of Bridges
Brain Image Library
Big Data for Better HealthHuman BioMolecular Atlas
Campus Clusters
10s ofPB
10PB
2.2PB
Hybrid on-prem
data/AI/HPDA+ Cloud
Dedicated resources +
cloud useof Bridges

26
Big Data for Better Health (BD4BH)
Implementing, applying, and evaluating machine learning
methods for predicting patient outcomes of breast and
lung cancer
University of Pittsburgh Department of Biomedical
Informatics (Gregory Cooper), CMU Machine Learning
(Ziv Bar-Joseph) and Computational Biology (Robert
Murphy), and PSC (Nick Nystrom, Alex Ropelewski)
Dedicated 2.2PB file system (/pghbio) attached to Bridges
for long-term data management & collaboration
Big Data research training opportunities: summer program
for Lincoln University students

Confocal Fluorescence Microscopy:
multispectral, subcellular resolution, highly quantitative
Will contain whole-brain volumetric images of mouse, rat, and other
mammals, targeted experiments highlighting connectivity between
cells, spatial transcriptomic data, and metadata describing essential
information about the experiments.
Supported by the National Institute of Mental Health of the
NIH under award number R24MH114793 ($5M).
Alex Ropelewski (PSC), Marcel Bruchez (CMU Biology),
Simon Watkins (Pitt Cell Biology & Center for Biologic Imaging)
Integrated with Bridges to support additional advanced analytics and
development of AI/ML techniques.
27
The Brain Image Library
A. M. Watson et al., Ribbon scanning confocal for high-speed high-resolution volume
imaging of brain. PLoS ONE 12 (2017) doi: https://doi.org/10.1371/journal.pone.0180486.
brainimagelibrary.org

28
Human Biomolecular Atlas Program (HuBMAP)
“The Human BioMolecular Atlas Program (HuBMAP)
aims to facilitate research on single cells within tissues by
supporting data generation and technology development
to explore the relationship between cellular organization
and function, as well as variability in normal tissue
organization at the level of individual cells.” —NIH
The PSC+Pitt team was awarded development of the Infrastructure Component (IC) for the HuBMAP
HIVE (Integration, Visualization & Engagement)
– To receive data from Tissue Mapping Centers at Florida (lymphatic system), CalTech (endothelium),
Vanderbilt, Stanford, and UCSD (kidney, urinary tract, and lung)
– Supporting Tools Components at CMU and Harvard
– Supporting Mapping Components at Indiana University Bloomington and New York Genome Center
– Interfacing with the Collaboration Component at U. of South Dakota
– Supporting Transformative Technology Development centers at CalTech (single-cell transcriptomics),
Stanford (genomic imaging), Purdue (sub-cellular mass spec), and Harvard (proteomics)
Hybrid on-prem data/AI/HPDA + Cloud

Outline
Motivation & Vision
Summary
29
Outline
Motivation & Vision
Summary

An AI for making decisions with imperfectinformation:
Beating Top Pros in Heads-Up No-Limit Texas Hold’emPoker
Imperfect-info games require different
algorithms, but apply to important
classes of real-world problems:
– Medical treatment planning
– Negotiation
– Strategic pricing
– Auctions
– Military allocation problems
Heads-up no-limit Texas hold’em is the main
benchmark for games with imperfect information:
– 10161 situations
 Libratus was the first program to beat top humans
 Beat 4 top pros playing 120,000 hands over 20 days
 Libratus won decisively: 99.98% statistical significance
30
AI for Strategic Reasoning
Tuomas Sandholm and Noam Brown, Carnegie Mellon University
Prof. Tuomas Sandholm
watching one of the world’s
best players compete against
Libratus.
Libratus improved upon
previous best algorithms
by incorporating real-time
improvements in its
strategy.

31
AI for Strategic Reasoning
Tuomas Sandholm and Noam Brown, Carnegie Mellon University
Bridges enabled this breakthrough through 19 million core-hours of computing and 2.6 PB of data in the
knowledge base that Libratus generated.
Libratus, under the Chinese name Lengpudashi, or “cold poker master”, also won a 36,000-hand exhibition in China in
April 2017 against a team of six strong Chinese poker players. Further demonstrated at IJCAI 17 (Melbourne, August
2017) and NIPS 2017 (Long Beach, December 2017).
“The best AI's ability to do strategic
reasoning with imperfect
information has now surpassed that
of the best humans.”
—Professor Tuomas Sandholm,
—Carnegie Mellon University
1. N. Brown, T. Sandholm, Safe and Nested Subgame Solving for Imperfect-
Information Games, in NIPS 2017, I. Guyon et al., Eds. (Curran Associates,
Inc., Long Beach, California, 2017), pp. 689-699.
2. N. Brown, T. Sandholm, Superhuman AI for heads-up no-limit poker: Libratus
beats top professionals. Science (2017) doi: 10.1126/science.aao1733.
AwardedBest Paperat NIPS2017
Companionpaperin Science

Prof. Sandholm launched two startups
on Libratus’ algorithms:
Strategic Machine Inc. and Strategy
Robot.
In August 2018, Strategy Robot
received a 2-year contract for up to
$10M from the Pentagon’s Defense
Innovation Unit.
32
Impact on the National Interest
https://www.wired.com/story/poker-playing-robot-goes-to-pentagon/

Materials Discovery Through Data Driven Structural Search and Heusler
Nanostructures
Discovery of high-pressure compounds
– Materials discovery using density functional theory and the minima hopping
structure prediction method
– Discovery of FeBi2, the first iron-bismuth compound
– Discovery of two superconducting compounds in the
Cu-Bi system, CuBi and Cu11Bi7
Discovery of a new form of TiO2
– Employed machine learning to explore new TiO2 polymorphs
– Identified a new TiO2 hexagonal nano sheet (HNS)
– The HNS has a tunable band-gap and could be used for photocatalytic water
splitting and H2 production
33
Materials Discovery for Energy Applications
Chris Wolverton, Northwestern University
AI-Driven HPC

Applying machine learning to detect severe storm-causing
clouds
– Leveraging the vast historical archive of satellite imagery, radar
data, and weather report data from the NOAA to train statistical
models including deep neural networks on Bridges’ CPUs and
GPUs
– Achieved high accuracy in detection of cloud patterns
– Developed fundamental statistical methods for data analysis
– Increasing the prediction lead time using deep models and GPUs
34
Severe Thunderstorm Prediction with Big Visual Data
James Z. Wang et al., Penn State
Detection of severe storm causing comma-shaped clouds
from satellite images
Detection and categorization of bow echoes
from weather radar data
1. Zheng et al., Detecting Comma-shaped Clouds for Severe Weather Forecasting
using Shape and Motion, IEEE Transactions on Geosciences and Remote
Sensing, under 2nd-round review, 2018.
2. J. Ye, P. Wu, J. Z. Wang, J. Li, Fast Discrete Distribution Clustering Using
Wasserstein Barycenter With Sparse Support. IEEE Transactions on Signal
Processing 65, 2317-2332 (2017) doi: 10.1109/TSP.2017.2659647.

The High-Luminosity Large Hadron Collider (HL-LHC) will increase
luminosity by 10×, resulting in ~1EB of data.
The Compact Muon Solenoid (CMS) experiment will allow study of the
Standard Model, extra dimensions, and dark matter.
Fermilab is now using Bridges to integrate HPC into their workflow, in
preparation for HL-LHC coming online in 2026.
35
Fermilab Using Bridges to Prep for CMS @ HL-LHC
Learn more: https://www.psc.edu/news-publications/2930-psc-supplies-computation-to-large-hadron-collider-group
Estimated CPU resources required for CMS into the HL-LHC era, using the
current computing model with parameters projected out for the next 12 years.
From A Roadmap for HEP Software and Computing R&D for the 2020s, HPE
Software Foundation.
CMS Detector. From CERN,
https://home.cern/science/experiments/cms
Event display of heavy-ion collision registered at the CMS
detector on Nov. 8, 2018 (image: Thomas McCauley).
From https://cms.cern/news/2018-heavy-ion-collision-run-
has-started.

36
Unsupervised Deep Learning Reveals Prognostically Relevant Subtypes of Glioblastoma
Jonathan D. Young, Chunhui Cai, and Xinghua Lu, Univ. of Pittsburgh
Showed that a deep learning model can be trained to represent
biologically and clinically meaningful abstractions of cancer gene
expression data
Data: The Cancer Genome Atlas (1.2 PB)
Hypotheses: Hierarchical structures emerging from deep
learning on gene expression data relate to the cellular signal
system, and the first hidden layer represents signals related to
transcription factor activation. [1]
– Model selection indicates ~1,300 units in the first hidden layer,
consistent with ~1,400 human transcription factors.
– Consensus clustering on the third hidden layer led to discovery of
clusters of glioblastoma multiforme with differential survival.
J. D. Young, C. Cai, X. Lu, Unsupervised deep learning reveals prognostically
relevant subtypes of glioblastoma. BMC Bioinformatics 18, 381 (2017)
doi: 10.1186/s12859-017-1798-2.
“One of these clusters contained all of the glioblastoma
samples with G-CIMP, a known methylation phenotype
driven by the IDH1 mutation and associated with favorable
prognosis, suggesting that the hidden units in the 3rd
hidden layer representations captured a methylation signal
without explicitly using methylation data as input.”
—Jonathan D. Young, Chunhui Cai, and Xinghua Lu
·

Causal Generative Domain Adaptation Networks
– A deep learning model trained with image data from one
hospital (“domain”) may fail to produce reliable
predictions in a different hospital where the data
distribution is different
– A generative domain adaptation network (G-DAN),
implemented using PyTorch, is able to understand
distribution changes and generate new domains
– Incorporating causal structure into the model – a causal
G-DAN (CG-DAN) can reduce its complexity
and accordingly improve the transfer efficiency
37
Modeling of Imaging and Genetics using a Deep Graphical Model
Kayhan Batmanghelich, University of Pittsburgh
M. Gong, K. Zhang, B. Huang, C. Glymour, D. Tao, and K. Batmanghelich,
“Causal Generative Domain Adaptation Networks,” arXiv:1804.04333, 2018,
http://arxiv.org/abs/1804.04333.

38
Multimodal Automatic Speech Recognition (ASR)
Florian Metze (CMU) et al.
2017 Jelinek Summer Workshop on Speech and Language Technology (JSALT)

Studying firm and investment fund financial disclosure using
Deep Learning Natural Language Processing models
– Results presented at the Doctoral Consortium at the Text as Data
2018 conference
– An early version linking the text of earnings announcements to
market reactions was been presented at the SEC Doctoral
Symposium 2018
39
Deep Learning for Text-Based Prediction in Finance
Bryan Routledge and Vitaliy Merso, Carnegie Mellon University
“Given the large sizes of our corpora (hundreds of millions of words) and the
computational requirements of the modern Deep Learning models, our work
would be impossible without the support from Bridges.”
—Brian Routledge, CMU
·
Many words used by investment funds in letters to their
shareholders are highly context-dependent. For example, the
word “subprime” can be either a very strong signal of a letter
describing a booming market or a very weak one, depending on
what other words appear around it.

Privacy-preserving dataset generation
– Fanti & Lin’s recent research aims to understand
fundamentally how Generative Adversarial Networks (GANs)
internally represent complex data structures and to harness
these observations to use GANs for privacy-preserving
dataset generation
– GANs are a new class of data-driven, neural network based
generative models that excel in high dimensions.
This work has led to two papers accepted to NIPS
2018:
– “The power of two samples in generative adversarial
networks” proposes “packing”, a principled approach to
improving the quality of generated images
– “Robustness of conditional GANs to noisy labels” earned a
Spotlight Award at NIPS 2018, proposing a novel,
theoretically sound, and practical GAN architecture that
consistently improves upon baseline approaches to learning
conditional generators where the labels are corrupted by
random noise
40
Exploring and Generating Data with Generative Adversarial Networks
Giulia Fanti, Zinan Lin, Carnegie Mellon University
CelebA samples generated from DCGAN (upper) and
PacDCGAN2 (lower) show PacDC-GAN2 generates more
diverse and sharper images.
1. Z. Lin, A. Khetan, G. Fanti, and S. Oh, “PacGAN: The
power of two samples in generative adversarial networks,”
arXiv:1712.04086, 2017.
2. K. Thekumparampil, A. Khetan, Z. Lin, and S. Oh,
“Robustness of conditional GANs to noisy labels,”
forthcoming in NIPS 2018, 2018 (Spotlight Award).

Learning interpretable latent representations:
a deformable generator model disentangles
appearance and geometric information into
two independent latent vectors
– The appearance generator produces the
appearance information, including color,
illumination, identity or category, of an image
– The geometric generator produces displacement
of the coordinates of each pixel and performs
geometric warping, such as stretching and
rotation, on the appearance generator to obtain
the ﬁnal synthesized image.
The model can learn both representations
from image data in an unsupervised manner.
41
Towards a Deeper Understanding of Generative Image Models in Vision
Ying Nian Wu, UCLA
Each dimension of the appearance latent vector encodes appearance
information such as color, illumination, and gender. In the ﬁst line,
from left to right, the color of background varies from black to white,
and the gender changes from a woman to a man. In the second line,
the moustache of the man becomes thicker when the corresponding
dimension of Z approaches zero, and the hair of the woman becomes
denser when the corresponding dimension of Z increases. In the third
line, from left to right, the skin color changes from dark to white. In
the fourth line, from left to right, the illumination lighting changes
from the left-side of the face to the right-side of the face.

Exploiting Resolution to Tune Accuracy and Speed
– The AdaScale project is about exploiting the resolution of the image “as a
knob” to improve the accuracy and speed of the deep neural network-
based object detection system.
42
Towards Real-time Video Object Detection Using Adaptive Scaling
Ting-Wu (Rudy) Chin, Ruizhou Ding, and Diana Marculescu, Carnegie Mellon University
Without AdaScale
The qualitative results of detection
accuracy achieved by AdaScale.
The performance of AdaScale on
various baselines.
With AdaScale
1. T.-W. Chin, R. Ding, and D. Marculescu, “AdaScale: Towards Real-Time
Video Object Detection Using Adaptive Scaling,” in SysML 2019, 2019
[Online]. Available: https://www.sysml.cc/papers.html#

Extracting high-quality information about energy systems
from overhead imagery with deep learning
– Precise locations of buildings (energy consumption)
– Small-scale solar arrays (energy generation)
– Improved speed and performance by expanding the receptive
field of neural networks only during label inference
43
Mapping Energy Infrastructure Using Deep Learning and Large Remote Sensing Datasets
Jordan Malof, Duke University
B. Huang et al., “Large-scale semantic classification: outcome
of the first year of Inria aerial image labeling benchmark,” in
IEEE International Geoscience and Remote Sensing
Symposium – IGARSS 2018, 2018.
https://hal.inria.fr/hal-01767807
Satellite image Building mappings
Solar mappingsAerial photograph
Increasingreceptive field size (in pixels)
Performance
(higherisbetter)
Computationtime
(lowerisbetter)

The Project in Figures
– 4 cams
– 5 weeks of data collection (Aug 24 to Sep 28, 2018)
– 3200 hours of video processed
– 250 million detections
– 12 categories: pedestrians, trolleys, seats, tables,
sun umbrellas, tents, cars, pickups, vans, trucks,
bikes, motorcycles
Motivations
– Public safety
– Pedestrian flow and crowd management
– Vehicular traffic affection
– Venues and events impact assessment
Technology Capabilities
– Number of people, vehicles and objects detected
– Segmentation
– Location, Trajectory, Speed
– Prediction
– Anonymity from scratch
44
Understanding Public Space Use in Market Square
Javier Argota Sánchez-Vaquerizo, Carnegie Mellon University
Insights
– Weather (rain) affection on attendance
– Uneven distribution of pedestrians in the space
– Events and venues positive impact on attendance
– Short duration of visits

46
Pedestrians
Trolleys
Seats
Tables
Sun umbrellas
Tents
Cars
Pickups
Vans
Trucks
Bikes
Motorcycles

Object detection in computer vision traditionally works
with relatively low-resolution images. However, the
resolution of recording devices is increasing, requiring
new methods for processing high-resolution data.
Ruzicka & Franchetti’s attention pipeline method
uses two-staged evaluation of each image or video
frame under rough and refined resolution to limit
the total number of necessary evaluations.
Both stages use the fast object detection model YOLO v2.
Their distributed-GPU code maintains high accuracy while reaching performance of
3-6 fps on 4k video and 2 fps on 8k video. This outperforms the individual base-line
approaches, while allowing the user to set the trade-off between accuracy and
performance.
Best Paper Finalist at IEEE High Performance Extreme Computing Conference (HPEC)
201847
Fast and Accurate Object Detection in High-Resolution Video Using GPUs
Vic Ruzicka and Franz Franchetti, Carnegie Mellon University
Example of a crowded 4K video frame annotated
with Ruzicka & Franchetti’s method.

48
Fast and Accurate Object Detection in High-Resolution Video Using GPUs
Vic Růžička and Franz Franchetti, Carnegie Mellon University

Multi-agent path finding (MAPF)
– An essential component of many large-scale, real-world robot
deployments, from aerial swarms to warehouse automation.
– Most state-of-the-art MAPF algorithms still rely on centralized
planning, scaling poorly past a few hundred agents.
– Such planning approaches are maladapted to real-world
deployments, where noise and uncertainty often require paths
be recomputed online, which is impossible when planning
times are in seconds to minutes.
Pathfinding via Reinforcement + Imitation Learning
– Using Bridges-GPU, Sartoretti trained and tested PRIMAL, a novel
framework for MAPF that combines reinforcement and imitation
learning to teach fully-decentralized policies, where agents
reactively plan paths online in a partially-observable world while
exhibiting implicit coordination.
– In low obstacle-density environments, PRIMAL outperforms state-of-the-art MAPF planners in certain cases, even
though these have access to the whole state of the system. They also deployed PRIMAL on physical and simulated
robots in a factory mockup scenario, showing how robots can benefit from their online, local-information-based,
decentralized MAPF approach.
49
Distributed Learning for Large-Scale Multi-Robot Path Planning in Complex Environments
Guillaume Sartoretti, Carnegie Mellon University
Example problem where 100 simulated robots (white dots) must
compute individual, collision-free paths in a large factory-like
environment. Reproduced from [1].
1. G. Sartoretti et al., “PRIMAL: Pathfinding via
Reinforcement and Imitation Multi-Agent Learning,”
2018. http://arxiv.org/abs/1809.03531.

• …
50
https://events.library.cmu.edu/aidr2019/
Automation in data discovery
Automation in data curation and generation
Measuring and improving data quality
Integrating datasets and enabling interoperability
Biomedical data discovery and reuse
Data privacy, security and algorithmic bias
The future of scientific data and how we work together
Deadline for Abstracts: February 22
Tom M. Mitchell
Interim Dean and
E. Fredkin University
Professor
School of
Computer Science
Carnegie Mellon
University
Glen de Vries
President and
Co-founder
Medidata Solutions
Robert F. Murphy
Ray and Stephanie
Lane Professor
Head of
Computational Biology
School of
Computer Science
Carnegie Mellon
University
Natasha Noy
Staff Scientist
Google AI
KEYNOTES
INVITEDSPEAKERS

Outline
Motivation & Vision
Summary
52
Outline
Motivation & Vision
Summary

PSC’s approach to scalable, converged HPC+AI is enabling breakthroughs
across an extremely broad range of research areas.
These resources – Bridges, including Bridges-AI, are available at no charge for
research and education
– Bridges-AI builds on Bridges’ strength in converged HPC, AI, and Big Data to provide
a unique platform for AI and AI-enabled simulation.
To request a free research/education allocation, visit:
https://psc.edu/about-bridges/apply
53
Summary

Pioneering and Democratizing Scalable HPC+AI at PSC

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Pioneering and Democratizing Scalable HPC+AI at PSC

Similar to Pioneering and Democratizing Scalable HPC+AI at PSC (20)

More from inside-BigData.com

More from inside-BigData.com (20)

Recently uploaded

Recently uploaded (20)

Pioneering and Democratizing Scalable HPC+AI at PSC

Editor's Notes