Transforming Research in Collaboration with Funding Agencies

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sanjay Padhi
Amazon Web Services
194315
Transforming Research in
Collaboration with Funding Agencies

Today
Chaitanya Baru
Harnessing the Data Revolution
Andrea Norris
Transforming Research at
The National Institutes of Health

https://datascience.nih.gov/

Data Commons on Amazon Web Services
AWS Research and Technical Computing: https://amzn.to/2tiPSY1

Developing data-driven platforms that integrate
large amounts of genomic and clinical data from
different disease types.
Empowering the collaborative discovery,
engagement, and necessary partnerships
across disease communities that are crucial for
progress in our biological understanding of
diseases.
Enabling rapid translation to personalized
treatments for patients diagnosed with childhood
cancer or structural birth defects.
Accelerating discovery of genetic causes and
shared biologic pathways within and across these
conditions.

Collaborative program with the National Science Foundation (NSF)
• The big data program supported by multiple directorates at NSF, provides funds up to $26.5 million in addition to Cloud
Credits to perform cutting edge big data research on cloud for a period of 3-4 years (up to 2021)
• Big Data Award (2017): Out of 8 awards – 5 were awarded to researchers using AWS for Research
Research Initiatives - https://amzn.to/2GVxx9a

• Detecting Financial Market Manipulation: An Integrated Data- and Model-Driven Approach
University of Michigan, Georgia Tech
• Scalable and Interpretable machine learning: bridging mechanistic and data-driven modeling in the biological sciences
University of California, Berkeley
• Taming Big Networks via Embedding
University of Virginia, University of Illinois at Urbana-Champaign
• Domain Adaptation Approaches for Classifying Crisis Related Data on Social Media
Kansas State University, University of North Texas and Pennsylvania State University
• Distributed Semi-Supervised Training of Deep Models and Its Applications in Video Understanding
University of Central Florida
Examples: Research supported by the AWS NSF Collaboration
“In today's era of data-driven science and engineering, we are pleased to work with the AWS Research Initiative via the NSF
BIGDATA program, to provide cloud resources for our Nation’s researchers to foster and accelerate discovery and innovation."
Dr. Jim Kurose, Assistant Director, CISE, National Science Foundation (NSF)
“This NSF big data award, coupled with AWS’s advanced computational and analytic services, is expected to help unlock the
secrets of interactions among biomolecules that drive human and animal biological processes.”
Dr. Bin Yu, Chancellor’s Professor at University of California,

Andrea Norris
Director, Center for Information Technology &
Chief Information Officer, National Institutes of Health
June 20, 2018
Session Code: 194315
Transforming Research at
The National Institutes of Health

The National Institutes of Health (NIH) Mission
. . .to seek fundamental knowledge about the nature and
behavior of living systems and the application of that
knowledge to enhance health, lengthen life, and reduce the
burdens of illness and disability.

Big Biomedical Data
Imaging EHR
Clinical
Other ′OmicsGenomic
Exposure

NIH Strategic Plan for Data Science
Data Infrastructure
• Optimize data
storage and
security
• Connect NIH data
systems
Modernized Data
Solution
• Modernize data
repository solution
• Support storage
and sharing of
individual datasets
• Better integrate
clinical and
observational data
into biomedical data
science
Data Management,
Analytics, and
Tools
• Support useful,
generalizable, and
accessible tools
and workflows
• Broaden utility of
and access to
specialized tools
• Improve discovery
and cataloging
resources
Workforce
Development
• Enhance the NIH
data-science
workforce
• Expand the national
research workforce
• Engage a broader
community
Stewardship and
Sustainability
• Develop policies for
a FAIR data
solution
• Enhance
stewardship

The Cancer Genome Atlas (TCGA)
• Biospecimen Repositories
• Genomics Data Analysis Centers
• Analysis Working Groups
• Proteome Characterization
Centers
• Genome Characterization Centers
• Data Coordinating Center
Supported by

Genomic Data Commons and the NCI Cloud Resources
Web Interface
Data Submission
& Harmonization
GDC
APIs
Researchers
APIs
Web Interface
Genomic Data Commons:
Harmonization,
Visualization,
& Download
Cloud Resources:
Compute,
Pipelines,
Workspaces
Authentication & Authorization through NIH systems
Supported by

NIH Microbiome Cloud Project (MCP)
MCP is a collaboration with Amazon Web Services
that aims to improve access to and analysis of data
from the Human Microbiome Project.
~5 TB of
Human Microbiome
Project Data
Hosted in a public
dataset at no cost
Data analytic
tools
Researchers can
analyze data
online
Supported by

All of Us (Precision Medicine)
Nurture relationships
with one million or more
participant partners, from all
walks of life, for decades
Catalyze a
robust ecosystem
of researchers and
funders hungry to use
and support it
Deliver the largest,
richest biomedical
dataset ever
that is easy, safe,
and free to access

NIH Data Commons
Interoperable Compute Platforms
Services: APIs, Containers, Indexing
Software: Services & Tools
Scientific Analysis Tools & Workflows
Data
“Reference” Data Sets
Researcher Defined Data
Portal
F
A
I
R
NTEROPERABLE
EUSABLE
CCESSIBLE
INDABLE
Supported by

Attributed to Warren A. Kibbe, Ph.D
Duke University School of Medicine
Team Science

NIH to Hire a Chief Data Strategist and
Director, Office of Data Science Strategy

Erwin Gianchandani, & Chaitan Baru
Directorate for Computer and Information Science and Engineering
National Science Foundation
June 20, 2018
Session Code: 194315
Harnessing the Data Revolution

RESEARCH IDEAS
Windows on the
Universe:
The Era of Multi-
messenger
Astrophysics
The Quantum
Leap:
Leading the
Next Quantum
Revolution
Navigating
the
New Arctic
Understanding
the Rules of
Life:
Predicting
Phenotype
PROCESS IDEAS
Mid-scale
Research
Infrastructure
Growing
Convergent
Research at NSF
NSF 2050:
Seeding
Innovation
NSF-INCLUDES:
Enhancing Science
and Engineering
through Diversity
Harnessing
Data for 21st
Century Science
and Engineering
Work at the
Human-
Technology
Frontier:
Shaping the
Future
NSF
“Big
Ideas”
2© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

“Engage NSF’s research community in the pursuit of fundamental research in data science and
engineering, the development of a cohesive, federated, national-scale approach to research data
infrastructure, and the development of a 21st-century data-capable workforce.”

Harnessing the Data Revolution: Five Themes
Science domains
Foundations
Systems,
Algorithms
Cyber
infrastructure
Education,
Workforce
Research across all NSF
Directorates
Systems,
algorithms
data-centric
algorithms,
systems
Data-intensive
research
in all areas of science and
engineering
Theoretical
foundations
mathematics,
statistics, computer
& computational
science
Educational pathways
Innovations grounded in an
education-research-based
framework
Accelerating data-intensive research
Advanced
cyberinfrastructure

Data-
Intensive
Science &
Engineering
Science Problems and Data Challenges…
– Space weather: predictions requiring comprehensive synthesis of
diverse sets of observations with state of the art modeling
– Multi-messenger astrophysics: Utilizing data collected by the
latest generation of observational facilities
– ECO/GEO: Ecosystem forecasting: Understanding decadal-scale
changes in ecosystems and the resources they provide;
integrating data from disparate sources, e.g., NEON, LTER sites,
NOAA satellites, EarthCube and other NSF programs…
– DMR: Using and integrating data from multiple laboratories,
techniques, and/or chemical systems can accelerate discovery of
more efficient or selective catalysts.
– Themes from other NSF Big Ideas requiring advances in data
science…

HDR Institutes to Tackle Data Challenges in Data-
Intensive Science and Engineering Research
• E.g., Real-time data: Sensing, analysis, assimilation,
decision-making with real-time data streams
• E.g., Integration across multi-scale, multi-modal data, with
data assimilation, to enable forecasting, predictions, …
• All HDR activities are linked. Institutes must link with:
– Foundations: TRIPODS Centers
– Systems/Algorithms: Open Knowledge Network, Model Commons
– Education and Workforce Development: HDR Academy, Data
Science Corps
Data-
Intensive
Science &
Engineering

HDR Foundations
NSF Transdisciplinary Research in Principles of Data
Science (TRIPODS) Program
– Establish collaborations among computer and computational
scientists, statisticians, and mathematicians
– Develop the principles of data science – as distinct from CS,
statistics, and mathematics
Foundations

HDR Systems and Algorithms
• Open Knowledge Network
• An open web-scale knowledge network of semantically-
linked concepts and data
• Would foster research on an entire class of new applications
leveraging data, context, and inferences from data
• Should support question/answer interfaces, dialog-based
interactions, explanatory/story-telling interfaces
• Model Commons
– Enabling sharing and reuse of data-intensive models—
including machine learning models
– Provides support for reproducibility, transfer learning…
Systems,
Algorithms

8
Education and Workforce Development:
HDR Academy
• Catalog, collect, create education/training materials
– Collect data science education / training materials from HDR projects
– Other related NSF projects
– Other, non-NSF sources
• HDR postdocs, undergrads
– Place postdocs and REU students in “cross-training” positions e.g., domain
scientist placed with a data group, and vice versa
• HDR bootcamps
– Offer data science bootcamps – for grad students; postdocs & junior
faculty; senior faculty…
Education,
Workforce

Education,
Workforce
9
Data Science Corps
• Data Science Corps Workshop, Dec 7-8, 2017, McCourt School of Public
Policy, Georgetown University, Washington DC
• Link data science students and professionals to data science projects:
– In academia, industry, government, and non-profits
• Via capstone projects, summer internships, coop programs, study abroad
programs, etc.
• Focus especially on community
colleges, 4-year colleges, MSIs, etc.
Data Science Corps Workshop, December 7-8, 2017
DataScienceCorps
Graduate
programs
Undergaduate
programs
4-year
colleges
Community
colleges
Online
programs
Industry
NGOs
Volunteer Organizations
Industry
NGOs, e.g., Data Science for
Social Good, DataKind, etc
Universities, other
research institutions
Internatonal
Organizations, e.g.,
WorldBank, UNICEF, ITU,
Local / County / State /
Federal Governments
Projects in:
• Basic research
• Smart &
Connected
Communities
• Health
• Criminal Justice
• Transportation,
• Energy,
• ..
Project Organizations
Students
from
Academic
Programs
Experts
from
Industry,
NGOs
Skills and
expertise
Varying
levels of
skills,
expertise,
and
experiences

10
Data Science Education-related Workshops Education,
Workforce,
Outreach
 Data science education-related workshops
 National Academies of Science Study/Report on Envisioning the Data
Science Discipline: The Undergraduate Perspective, May 2018
 NSF Workshop on Keeping Data Science Broad: Negotiating the Digital
and Data Divide, Oct, 2017

NSF BIGDATA: Critical Techniques, Technologies and
Methodologies for Advancing Foundations and Applications
of Big Data Sciences and Engineering
• Three cloud providers
Amazon Web Services,
Google, Microsoft @
$3M each
• IBM joined in 2018 @
$3M
• Researchers may
request cloud resources
for their projects, within a
min and max range.

BIGDATA Projects in 2017 using AWS
• Detecting Financial Market Manipulation: An Integrated Data- and Model-Driven Approach
University of Michigan, Georgia Tech
• Scalable and Interpretable machine learning: bridging mechanistic & data-driven modeling in biological sciences
University of California, Berkeley
• Taming Big Networks via Embedding
University of Virginia, University of Illinois at Urbana-Champaign
• Domain Adaptation Approaches for Classifying Crisis Related Data on Social Media
Kansas State University, University of North Texas and Pennsylvania State University
• Distributed Semi-Supervised Training of Deep Models and Its Applications in Video Understanding
University of Central Florida
Out of a total of 8 awards: 5 were awarded to researchers using AWS

BIGDATA: Domain Adaptation Approaches for
Classifying Crisis Related Data on Social Media
Kansas State University PIs: Doina Caragea, Cornelia Caragea and Dan Andresen
Pennsylvania State University PIs: Andrea Tapia, Jess Kropczynski

• iRF and iRF2.0 discover
predictive and stable high-
order interactions among
variables, including
biomolecules, for
interpretations and follow-
up experiments and studies
• We study 750K human
genotypes along with 30
years of medical records to
learn the genetic
architecture of depression,
heart disease, and prostate
cancer
Iterative Random Forests (iRF) and iRF2.0: Genome-
Wide Epistasis Studies (GWES)
PI: Bin Yu co-PI: Ben Brown
iRF

Semi Supervised Semantic Segmentation Using
Generative Adversarial Network (ICCV-2017)
N. Souly, C. Spampinato and M. Shah
Improving the Improved Training of Wasserstein GANs (CT-GAN): A Consistency Term
and Its Dual Effect (ICLR-2018)
X. Wei, B. Gong, Z. Liu, W. Lu, L. Wang
 Problem: The gradient penalty 𝑮𝑷| 𝒙 often fails
to check the continuity of region near the real
data 𝒙. Illustration is shown to the right.
 Our Approach: to alleviate the issue, we
explicitly check the continuity condition by
using two perturbation 𝒙′
, 𝒙′′
near any
observed real data point 𝒙.
Benefits from Amazon AWS
 A pre-configured environment to easily
build our deep learning programs.
 4 weeks on our local two 1080 GPUs
vs.
4 days on one AWS p3.16xlarge instance!
An incredible speed!
Semi-Supervised Training with Generative Adversarial Networks on AWS
University of Central Florida, Drs. Mubarak Shah (PI) and Liqiang Wang (Co-PI)

Thank you!

Transforming Research in Collaboration with Funding Agencies

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Transforming Research in Collaboration with Funding Agencies

Similar to Transforming Research in Collaboration with Funding Agencies (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Transforming Research in Collaboration with Funding Agencies