SlideShare a Scribd company logo
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sanjay Padhi
Amazon Web Services
194315
Transforming Research in
Collaboration with Funding Agencies
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Today
Chaitanya Baru
Harnessing the Data Revolution
Andrea Norris
Transforming Research at
The National Institutes of Health
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://datascience.nih.gov/
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Commons on Amazon Web Services
AWS Research and Technical Computing: https://amzn.to/2tiPSY1
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Developing data-driven platforms that integrate
large amounts of genomic and clinical data from
different disease types.
Empowering the collaborative discovery,
engagement, and necessary partnerships
across disease communities that are crucial for
progress in our biological understanding of
diseases.
Enabling rapid translation to personalized
treatments for patients diagnosed with childhood
cancer or structural birth defects.
Accelerating discovery of genetic causes and
shared biologic pathways within and across these
conditions.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Collaborative program with the National Science Foundation (NSF)
• The big data program supported by multiple directorates at NSF, provides funds up to $26.5 million in addition to Cloud
Credits to perform cutting edge big data research on cloud for a period of 3-4 years (up to 2021)
• Big Data Award (2017): Out of 8 awards – 5 were awarded to researchers using AWS for Research
Research Initiatives - https://amzn.to/2GVxx9a
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
• Detecting Financial Market Manipulation: An Integrated Data- and Model-Driven Approach
University of Michigan, Georgia Tech
• Scalable and Interpretable machine learning: bridging mechanistic and data-driven modeling in the biological sciences
University of California, Berkeley
• Taming Big Networks via Embedding
University of Virginia, University of Illinois at Urbana-Champaign
• Domain Adaptation Approaches for Classifying Crisis Related Data on Social Media
Kansas State University, University of North Texas and Pennsylvania State University
• Distributed Semi-Supervised Training of Deep Models and Its Applications in Video Understanding
University of Central Florida
Examples: Research supported by the AWS NSF Collaboration
“In today's era of data-driven science and engineering, we are pleased to work with the AWS Research Initiative via the NSF
BIGDATA program, to provide cloud resources for our Nation’s researchers to foster and accelerate discovery and innovation."
Dr. Jim Kurose, Assistant Director, CISE, National Science Foundation (NSF)
“This NSF big data award, coupled with AWS’s advanced computational and analytic services, is expected to help unlock the
secrets of interactions among biomolecules that drive human and animal biological processes.”
Dr. Bin Yu, Chancellor’s Professor at University of California,
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Andrea Norris
Director, Center for Information Technology &
Chief Information Officer, National Institutes of Health
June 20, 2018
Session Code: 194315
Transforming Research at
The National Institutes of Health
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The National Institutes of Health (NIH) Mission
. . .to seek fundamental knowledge about the nature and
behavior of living systems and the application of that
knowledge to enhance health, lengthen life, and reduce the
burdens of illness and disability.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Big Biomedical Data
Imaging EHR
Clinical
Other ′OmicsGenomic
Exposure
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NIH Strategic Plan for Data Science
Data Infrastructure
• Optimize data
storage and
security
• Connect NIH data
systems
Modernized Data
Solution
• Modernize data
repository solution
• Support storage
and sharing of
individual datasets
• Better integrate
clinical and
observational data
into biomedical data
science
Data Management,
Analytics, and
Tools
• Support useful,
generalizable, and
accessible tools
and workflows
• Broaden utility of
and access to
specialized tools
• Improve discovery
and cataloging
resources
Workforce
Development
• Enhance the NIH
data-science
workforce
• Expand the national
research workforce
• Engage a broader
community
Stewardship and
Sustainability
• Develop policies for
a FAIR data
solution
• Enhance
stewardship
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.Supported by
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The Cancer Genome Atlas (TCGA)
• Biospecimen Repositories
• Genomics Data Analysis Centers
• Analysis Working Groups
• Proteome Characterization
Centers
• Genome Characterization Centers
• Data Coordinating Center
Supported by
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Genomic Data Commons and the NCI Cloud Resources
Web Interface
Data Submission
& Harmonization
GDC
APIs
Researchers
APIs
Web Interface
Genomic Data Commons:
Harmonization,
Visualization,
& Download
Cloud Resources:
Compute,
Pipelines,
Workspaces
Authentication & Authorization through NIH systems
Supported by
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NIH Microbiome Cloud Project (MCP)
MCP is a collaboration with Amazon Web Services
that aims to improve access to and analysis of data
from the Human Microbiome Project.
~5 TB of
Human Microbiome
Project Data
Hosted in a public
dataset at no cost
Data analytic
tools
Researchers can
analyze data
online
Supported by
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
All of Us (Precision Medicine)
Nurture relationships
with one million or more
participant partners, from all
walks of life, for decades
Catalyze a
robust ecosystem
of researchers and
funders hungry to use
and support it
Deliver the largest,
richest biomedical
dataset ever
that is easy, safe,
and free to access
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NIH Data Commons
Interoperable Compute Platforms
Services: APIs, Containers, Indexing
Software: Services & Tools
Scientific Analysis Tools & Workflows
Data
“Reference” Data Sets
Researcher Defined Data
Portal
F
A
I
R
NTEROPERABLE
EUSABLE
CCESSIBLE
INDABLE
Supported by
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Attributed to Warren A. Kibbe, Ph.D
Duke University School of Medicine
Team Science
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NIH to Hire a Chief Data Strategist and
Director, Office of Data Science Strategy
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Erwin Gianchandani, & Chaitan Baru
Directorate for Computer and Information Science and Engineering
National Science Foundation
June 20, 2018
Session Code: 194315
Harnessing the Data Revolution
RESEARCH IDEAS
Windows on the
Universe:
The Era of Multi-
messenger
Astrophysics
The Quantum
Leap:
Leading the
Next Quantum
Revolution
Navigating
the
New Arctic
Understanding
the Rules of
Life:
Predicting
Phenotype
PROCESS IDEAS
Mid-scale
Research
Infrastructure
Growing
Convergent
Research at NSF
NSF 2050:
Seeding
Innovation
NSF-INCLUDES:
Enhancing Science
and Engineering
through Diversity
Harnessing
Data for 21st
Century Science
and Engineering
Work at the
Human-
Technology
Frontier:
Shaping the
Future
NSF
“Big
Ideas”
2© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
“Engage NSF’s research community in the pursuit of fundamental research in data science and
engineering, the development of a cohesive, federated, national-scale approach to research data
infrastructure, and the development of a 21st-century data-capable workforce.”
3© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Harnessing the Data Revolution: Five Themes
Science domains
Foundations
Systems,
Algorithms
Cyber
infrastructure
Education,
Workforce
Research across all NSF
Directorates
Systems,
algorithms
data-centric
algorithms,
systems
Data-intensive
research
in all areas of science and
engineering
Theoretical
foundations
mathematics,
statistics, computer
& computational
science
Educational pathways
Innovations grounded in an
education-research-based
framework
Accelerating data-intensive research
Advanced
cyberinfrastructure
4© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data-
Intensive
Science &
Engineering
Science Problems and Data Challenges…
– Space weather: predictions requiring comprehensive synthesis of
diverse sets of observations with state of the art modeling
– Multi-messenger astrophysics: Utilizing data collected by the
latest generation of observational facilities
– ECO/GEO: Ecosystem forecasting: Understanding decadal-scale
changes in ecosystems and the resources they provide;
integrating data from disparate sources, e.g., NEON, LTER sites,
NOAA satellites, EarthCube and other NSF programs…
– DMR: Using and integrating data from multiple laboratories,
techniques, and/or chemical systems can accelerate discovery of
more efficient or selective catalysts.
– Themes from other NSF Big Ideas requiring advances in data
science…
6© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
HDR Institutes to Tackle Data Challenges in Data-
Intensive Science and Engineering Research
• E.g., Real-time data: Sensing, analysis, assimilation,
decision-making with real-time data streams
• E.g., Integration across multi-scale, multi-modal data, with
data assimilation, to enable forecasting, predictions, …
• All HDR activities are linked. Institutes must link with:
– Foundations: TRIPODS Centers
– Systems/Algorithms: Open Knowledge Network, Model Commons
– Education and Workforce Development: HDR Academy, Data
Science Corps
Data-
Intensive
Science &
Engineering
7© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
HDR Foundations
NSF Transdisciplinary Research in Principles of Data
Science (TRIPODS) Program
– Establish collaborations among computer and computational
scientists, statisticians, and mathematicians
– Develop the principles of data science – as distinct from CS,
statistics, and mathematics
Foundations
5© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
HDR Systems and Algorithms
• Open Knowledge Network
• An open web-scale knowledge network of semantically-
linked concepts and data
• Would foster research on an entire class of new applications
leveraging data, context, and inferences from data
• Should support question/answer interfaces, dialog-based
interactions, explanatory/story-telling interfaces
• Model Commons
– Enabling sharing and reuse of data-intensive models—
including machine learning models
– Provides support for reproducibility, transfer learning…
Systems,
Algorithms
6© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
8
Education and Workforce Development:
HDR Academy
• Catalog, collect, create education/training materials
– Collect data science education / training materials from HDR projects
– Other related NSF projects
– Other, non-NSF sources
• HDR postdocs, undergrads
– Place postdocs and REU students in “cross-training” positions e.g., domain
scientist placed with a data group, and vice versa
• HDR bootcamps
– Offer data science bootcamps – for grad students; postdocs & junior
faculty; senior faculty…
Education,
Workforce
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Education,
Workforce
9
Data Science Corps
• Data Science Corps Workshop, Dec 7-8, 2017, McCourt School of Public
Policy, Georgetown University, Washington DC
• Link data science students and professionals to data science projects:
– In academia, industry, government, and non-profits
• Via capstone projects, summer internships, coop programs, study abroad
programs, etc.
• Focus especially on community
colleges, 4-year colleges, MSIs, etc.
Data Science Corps Workshop, December 7-8, 2017
DataScienceCorps
Graduate
programs
Undergaduate
programs
4-year
colleges
Community
colleges
Online
programs
Industry
NGOs
Volunteer Organizations
Industry
NGOs, e.g., Data Science for
Social Good, DataKind, etc
Universities, other
research institutions
Internatonal
Organizations, e.g.,
WorldBank, UNICEF, ITU,
Local / County / State /
Federal Governments
Projects in:
• Basic research
• Smart &
Connected
Communities
• Health
• Criminal Justice
• Transportation,
• Energy,
• ..
Project Organizations
Students
from
Academic
Programs
Experts
from
Industry,
NGOs
Skills and
expertise
Varying
levels of
skills,
expertise,
and
experiences
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
10
Data Science Education-related Workshops Education,
Workforce,
Outreach
 Data science education-related workshops
 National Academies of Science Study/Report on Envisioning the Data
Science Discipline: The Undergraduate Perspective, May 2018
 NSF Workshop on Keeping Data Science Broad: Negotiating the Digital
and Data Divide, Oct, 2017
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NSF BIGDATA: Critical Techniques, Technologies and
Methodologies for Advancing Foundations and Applications
of Big Data Sciences and Engineering
• Three cloud providers
Amazon Web Services,
Google, Microsoft @
$3M each
• IBM joined in 2018 @
$3M
• Researchers may
request cloud resources
for their projects, within a
min and max range.
11© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
BIGDATA Projects in 2017 using AWS
12© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
• Detecting Financial Market Manipulation: An Integrated Data- and Model-Driven Approach
University of Michigan, Georgia Tech
• Scalable and Interpretable machine learning: bridging mechanistic & data-driven modeling in biological sciences
University of California, Berkeley
• Taming Big Networks via Embedding
University of Virginia, University of Illinois at Urbana-Champaign
• Domain Adaptation Approaches for Classifying Crisis Related Data on Social Media
Kansas State University, University of North Texas and Pennsylvania State University
• Distributed Semi-Supervised Training of Deep Models and Its Applications in Video Understanding
University of Central Florida
Out of a total of 8 awards: 5 were awarded to researchers using AWS
BIGDATA: Domain Adaptation Approaches for
Classifying Crisis Related Data on Social Media
Kansas State University PIs: Doina Caragea, Cornelia Caragea and Dan Andresen
Pennsylvania State University PIs: Andrea Tapia, Jess Kropczynski
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
• iRF and iRF2.0 discover
predictive and stable high-
order interactions among
variables, including
biomolecules, for
interpretations and follow-
up experiments and studies
• We study 750K human
genotypes along with 30
years of medical records to
learn the genetic
architecture of depression,
heart disease, and prostate
cancer
Iterative Random Forests (iRF) and iRF2.0: Genome-
Wide Epistasis Studies (GWES)
PI: Bin Yu co-PI: Ben Brown
iRF
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Semi Supervised Semantic Segmentation Using
Generative Adversarial Network (ICCV-2017)
N. Souly, C. Spampinato and M. Shah
Improving the Improved Training of Wasserstein GANs (CT-GAN): A Consistency Term
and Its Dual Effect (ICLR-2018)
X. Wei, B. Gong, Z. Liu, W. Lu, L. Wang
 Problem: The gradient penalty 𝑮𝑷| 𝒙 often fails
to check the continuity of region near the real
data 𝒙. Illustration is shown to the right.
 Our Approach: to alleviate the issue, we
explicitly check the continuity condition by
using two perturbation 𝒙′
, 𝒙′′
near any
observed real data point 𝒙.
Benefits from Amazon AWS
 A pre-configured environment to easily
build our deep learning programs.
 4 weeks on our local two 1080 GPUs
vs.
4 days on one AWS p3.16xlarge instance!
An incredible speed!
Semi-Supervised Training with Generative Adversarial Networks on AWS
University of Central Florida, Drs. Mubarak Shah (PI) and Liqiang Wang (Co-PI)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!

More Related Content

What's hot

The NIH as a Digital Enterprise: Implications for PAG
The NIH as a Digital Enterprise: Implications for PAGThe NIH as a Digital Enterprise: Implications for PAG
The NIH as a Digital Enterprise: Implications for PAG
Philip Bourne
 
Big Data Brown Bag
Big Data Brown BagBig Data Brown Bag
Big Data Brown Bagusmanqureshi
 
Open Data in a Global Ecosystem
Open Data in a Global EcosystemOpen Data in a Global Ecosystem
Open Data in a Global Ecosystem
Philip Bourne
 
Big Data in Biomedicine – An NIH Perspective
Big Data in Biomedicine – An NIH PerspectiveBig Data in Biomedicine – An NIH Perspective
Big Data in Biomedicine – An NIH Perspective
Philip Bourne
 
Data report v 0.2 Press Release
Data report v 0.2 Press ReleaseData report v 0.2 Press Release
Data report v 0.2 Press ReleaseRosalyn Moran
 
Big Data in Biomedicine: Where is the NIH Headed
Big Data in Biomedicine: Where is the NIH HeadedBig Data in Biomedicine: Where is the NIH Headed
Big Data in Biomedicine: Where is the NIH Headed
Philip Bourne
 
The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...
Michel Dumontier
 
The Vision for Data @ the NIH
The Vision for Data @ the NIHThe Vision for Data @ the NIH
The Vision for Data @ the NIH
Philip Bourne
 
Massive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World ProblemsMassive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World Problems
inside-BigData.com
 
Data Policy for Open Science
Data Policy for Open ScienceData Policy for Open Science
Data Policy for Open Science
Research Data Alliance
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
Warren Kibbe
 
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...
Amit Sheth
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
Philip Bourne
 
SWOT Analysis - What Does it Tell Us?
SWOT Analysis - What Does it Tell Us?SWOT Analysis - What Does it Tell Us?
SWOT Analysis - What Does it Tell Us?
Philip Bourne
 
Brief on Linked Data at U.S. EPA to Chief Data Scientist
Brief on Linked Data at U.S. EPA to Chief Data ScientistBrief on Linked Data at U.S. EPA to Chief Data Scientist
Brief on Linked Data at U.S. EPA to Chief Data Scientist
Bernadette Hyland-Wood
 
Why Data Citation Currently Misses the Point
Why Data Citation Currently Misses the PointWhy Data Citation Currently Misses the Point
Why Data Citation Currently Misses the Point
Mark Parsons
 
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-researchUc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
University of California Curation Center
 
US EPA Resource Conservation and Recovery Act published as Linked Open Data
US EPA Resource Conservation and Recovery Act published as Linked Open DataUS EPA Resource Conservation and Recovery Act published as Linked Open Data
US EPA Resource Conservation and Recovery Act published as Linked Open Data
3 Round Stones
 
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
ICPSR
 

What's hot (20)

The NIH as a Digital Enterprise: Implications for PAG
The NIH as a Digital Enterprise: Implications for PAGThe NIH as a Digital Enterprise: Implications for PAG
The NIH as a Digital Enterprise: Implications for PAG
 
Big Data Brown Bag
Big Data Brown BagBig Data Brown Bag
Big Data Brown Bag
 
Open Data in a Global Ecosystem
Open Data in a Global EcosystemOpen Data in a Global Ecosystem
Open Data in a Global Ecosystem
 
Big Data in Biomedicine – An NIH Perspective
Big Data in Biomedicine – An NIH PerspectiveBig Data in Biomedicine – An NIH Perspective
Big Data in Biomedicine – An NIH Perspective
 
Data report v 0.2 Press Release
Data report v 0.2 Press ReleaseData report v 0.2 Press Release
Data report v 0.2 Press Release
 
Big Data in Biomedicine: Where is the NIH Headed
Big Data in Biomedicine: Where is the NIH HeadedBig Data in Biomedicine: Where is the NIH Headed
Big Data in Biomedicine: Where is the NIH Headed
 
The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...
 
The Vision for Data @ the NIH
The Vision for Data @ the NIHThe Vision for Data @ the NIH
The Vision for Data @ the NIH
 
Massive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World ProblemsMassive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World Problems
 
EHLP - July 2015 pg 6-8
EHLP - July 2015 pg 6-8EHLP - July 2015 pg 6-8
EHLP - July 2015 pg 6-8
 
Data Policy for Open Science
Data Policy for Open ScienceData Policy for Open Science
Data Policy for Open Science
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
 
SWOT Analysis - What Does it Tell Us?
SWOT Analysis - What Does it Tell Us?SWOT Analysis - What Does it Tell Us?
SWOT Analysis - What Does it Tell Us?
 
Brief on Linked Data at U.S. EPA to Chief Data Scientist
Brief on Linked Data at U.S. EPA to Chief Data ScientistBrief on Linked Data at U.S. EPA to Chief Data Scientist
Brief on Linked Data at U.S. EPA to Chief Data Scientist
 
Why Data Citation Currently Misses the Point
Why Data Citation Currently Misses the PointWhy Data Citation Currently Misses the Point
Why Data Citation Currently Misses the Point
 
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-researchUc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
 
US EPA Resource Conservation and Recovery Act published as Linked Open Data
US EPA Resource Conservation and Recovery Act published as Linked Open DataUS EPA Resource Conservation and Recovery Act published as Linked Open Data
US EPA Resource Conservation and Recovery Act published as Linked Open Data
 
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
 

Similar to Transforming Research in Collaboration with Funding Agencies

strata_ny_2016_version_final_no_animation
strata_ny_2016_version_final_no_animationstrata_ny_2016_version_final_no_animation
strata_ny_2016_version_final_no_animationTaposh Dutta Roy
 
Toward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data EcosystemToward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data Ecosystem
Globus
 
Parsec 191119 slideshare
Parsec 191119 slideshareParsec 191119 slideshare
Parsec 191119 slideshare
Alison Specht
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
Robert Grossman
 
Role of data in precision oncology
Role of data in precision oncologyRole of data in precision oncology
Role of data in precision oncology
Warren Kibbe
 
UVA School of Data Science
UVA School of Data ScienceUVA School of Data Science
UVA School of Data Science
Philip Bourne
 
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET
 
A Successful Academic Medical Center Must be a Truly Digital Enterprise
A Successful Academic Medical Center Must be a Truly Digital EnterpriseA Successful Academic Medical Center Must be a Truly Digital Enterprise
A Successful Academic Medical Center Must be a Truly Digital Enterprise
Philip Bourne
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
Robert Grossman
 
Big Data for Library Services (2017)
Big Data for Library Services (2017)Big Data for Library Services (2017)
Big Data for Library Services (2017)
Albert Anthony Gavino, MBA
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data Science
Philip Bourne
 
The UVA School of Data Science
The UVA School of Data ScienceThe UVA School of Data Science
The UVA School of Data Science
Philip Bourne
 
Next-Gen BI for Healthcare and Life Sciences on AWS
 Next-Gen BI for Healthcare and Life Sciences on AWS Next-Gen BI for Healthcare and Life Sciences on AWS
Next-Gen BI for Healthcare and Life Sciences on AWS
Amazon Web Services
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
Padma Metta
 
A VIVO VIEW OF CANCER RESEARCH: Dream, Vision and Reality
A VIVO VIEW OF CANCER RESEARCH: Dream, Vision and RealityA VIVO VIEW OF CANCER RESEARCH: Dream, Vision and Reality
A VIVO VIEW OF CANCER RESEARCH: Dream, Vision and Reality
Paul Courtney
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
Philip Bourne
 
Data Virtualization Modernizes Biobanking
Data Virtualization Modernizes BiobankingData Virtualization Modernizes Biobanking
Data Virtualization Modernizes Biobanking
Denodo
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
Robert Grossman
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
What's up at Kno.e.sis?
What's up at Kno.e.sis? What's up at Kno.e.sis?
What's up at Kno.e.sis?
Amit Sheth
 

Similar to Transforming Research in Collaboration with Funding Agencies (20)

strata_ny_2016_version_final_no_animation
strata_ny_2016_version_final_no_animationstrata_ny_2016_version_final_no_animation
strata_ny_2016_version_final_no_animation
 
Toward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data EcosystemToward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data Ecosystem
 
Parsec 191119 slideshare
Parsec 191119 slideshareParsec 191119 slideshare
Parsec 191119 slideshare
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
Role of data in precision oncology
Role of data in precision oncologyRole of data in precision oncology
Role of data in precision oncology
 
UVA School of Data Science
UVA School of Data ScienceUVA School of Data Science
UVA School of Data Science
 
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
 
A Successful Academic Medical Center Must be a Truly Digital Enterprise
A Successful Academic Medical Center Must be a Truly Digital EnterpriseA Successful Academic Medical Center Must be a Truly Digital Enterprise
A Successful Academic Medical Center Must be a Truly Digital Enterprise
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
Big Data for Library Services (2017)
Big Data for Library Services (2017)Big Data for Library Services (2017)
Big Data for Library Services (2017)
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data Science
 
The UVA School of Data Science
The UVA School of Data ScienceThe UVA School of Data Science
The UVA School of Data Science
 
Next-Gen BI for Healthcare and Life Sciences on AWS
 Next-Gen BI for Healthcare and Life Sciences on AWS Next-Gen BI for Healthcare and Life Sciences on AWS
Next-Gen BI for Healthcare and Life Sciences on AWS
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
 
A VIVO VIEW OF CANCER RESEARCH: Dream, Vision and Reality
A VIVO VIEW OF CANCER RESEARCH: Dream, Vision and RealityA VIVO VIEW OF CANCER RESEARCH: Dream, Vision and Reality
A VIVO VIEW OF CANCER RESEARCH: Dream, Vision and Reality
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
Data Virtualization Modernizes Biobanking
Data Virtualization Modernizes BiobankingData Virtualization Modernizes Biobanking
Data Virtualization Modernizes Biobanking
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
What's up at Kno.e.sis?
What's up at Kno.e.sis? What's up at Kno.e.sis?
What's up at Kno.e.sis?
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Transforming Research in Collaboration with Funding Agencies

  • 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Sanjay Padhi Amazon Web Services 194315 Transforming Research in Collaboration with Funding Agencies
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Today Chaitanya Baru Harnessing the Data Revolution Andrea Norris Transforming Research at The National Institutes of Health
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. https://datascience.nih.gov/
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Commons on Amazon Web Services AWS Research and Technical Computing: https://amzn.to/2tiPSY1
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Developing data-driven platforms that integrate large amounts of genomic and clinical data from different disease types. Empowering the collaborative discovery, engagement, and necessary partnerships across disease communities that are crucial for progress in our biological understanding of diseases. Enabling rapid translation to personalized treatments for patients diagnosed with childhood cancer or structural birth defects. Accelerating discovery of genetic causes and shared biologic pathways within and across these conditions.
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Collaborative program with the National Science Foundation (NSF) • The big data program supported by multiple directorates at NSF, provides funds up to $26.5 million in addition to Cloud Credits to perform cutting edge big data research on cloud for a period of 3-4 years (up to 2021) • Big Data Award (2017): Out of 8 awards – 5 were awarded to researchers using AWS for Research Research Initiatives - https://amzn.to/2GVxx9a
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. • Detecting Financial Market Manipulation: An Integrated Data- and Model-Driven Approach University of Michigan, Georgia Tech • Scalable and Interpretable machine learning: bridging mechanistic and data-driven modeling in the biological sciences University of California, Berkeley • Taming Big Networks via Embedding University of Virginia, University of Illinois at Urbana-Champaign • Domain Adaptation Approaches for Classifying Crisis Related Data on Social Media Kansas State University, University of North Texas and Pennsylvania State University • Distributed Semi-Supervised Training of Deep Models and Its Applications in Video Understanding University of Central Florida Examples: Research supported by the AWS NSF Collaboration “In today's era of data-driven science and engineering, we are pleased to work with the AWS Research Initiative via the NSF BIGDATA program, to provide cloud resources for our Nation’s researchers to foster and accelerate discovery and innovation." Dr. Jim Kurose, Assistant Director, CISE, National Science Foundation (NSF) “This NSF big data award, coupled with AWS’s advanced computational and analytic services, is expected to help unlock the secrets of interactions among biomolecules that drive human and animal biological processes.” Dr. Bin Yu, Chancellor’s Professor at University of California,
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Andrea Norris Director, Center for Information Technology & Chief Information Officer, National Institutes of Health June 20, 2018 Session Code: 194315 Transforming Research at The National Institutes of Health
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The National Institutes of Health (NIH) Mission . . .to seek fundamental knowledge about the nature and behavior of living systems and the application of that knowledge to enhance health, lengthen life, and reduce the burdens of illness and disability.
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Big Biomedical Data Imaging EHR Clinical Other ′OmicsGenomic Exposure
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. NIH Strategic Plan for Data Science Data Infrastructure • Optimize data storage and security • Connect NIH data systems Modernized Data Solution • Modernize data repository solution • Support storage and sharing of individual datasets • Better integrate clinical and observational data into biomedical data science Data Management, Analytics, and Tools • Support useful, generalizable, and accessible tools and workflows • Broaden utility of and access to specialized tools • Improve discovery and cataloging resources Workforce Development • Enhance the NIH data-science workforce • Expand the national research workforce • Engage a broader community Stewardship and Sustainability • Develop policies for a FAIR data solution • Enhance stewardship
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.Supported by
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The Cancer Genome Atlas (TCGA) • Biospecimen Repositories • Genomics Data Analysis Centers • Analysis Working Groups • Proteome Characterization Centers • Genome Characterization Centers • Data Coordinating Center Supported by
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Genomic Data Commons and the NCI Cloud Resources Web Interface Data Submission & Harmonization GDC APIs Researchers APIs Web Interface Genomic Data Commons: Harmonization, Visualization, & Download Cloud Resources: Compute, Pipelines, Workspaces Authentication & Authorization through NIH systems Supported by
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. NIH Microbiome Cloud Project (MCP) MCP is a collaboration with Amazon Web Services that aims to improve access to and analysis of data from the Human Microbiome Project. ~5 TB of Human Microbiome Project Data Hosted in a public dataset at no cost Data analytic tools Researchers can analyze data online Supported by
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. All of Us (Precision Medicine) Nurture relationships with one million or more participant partners, from all walks of life, for decades Catalyze a robust ecosystem of researchers and funders hungry to use and support it Deliver the largest, richest biomedical dataset ever that is easy, safe, and free to access
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. NIH Data Commons Interoperable Compute Platforms Services: APIs, Containers, Indexing Software: Services & Tools Scientific Analysis Tools & Workflows Data “Reference” Data Sets Researcher Defined Data Portal F A I R NTEROPERABLE EUSABLE CCESSIBLE INDABLE Supported by
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Attributed to Warren A. Kibbe, Ph.D Duke University School of Medicine Team Science
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. NIH to Hire a Chief Data Strategist and Director, Office of Data Science Strategy
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Erwin Gianchandani, & Chaitan Baru Directorate for Computer and Information Science and Engineering National Science Foundation June 20, 2018 Session Code: 194315 Harnessing the Data Revolution
  • 22. RESEARCH IDEAS Windows on the Universe: The Era of Multi- messenger Astrophysics The Quantum Leap: Leading the Next Quantum Revolution Navigating the New Arctic Understanding the Rules of Life: Predicting Phenotype PROCESS IDEAS Mid-scale Research Infrastructure Growing Convergent Research at NSF NSF 2050: Seeding Innovation NSF-INCLUDES: Enhancing Science and Engineering through Diversity Harnessing Data for 21st Century Science and Engineering Work at the Human- Technology Frontier: Shaping the Future NSF “Big Ideas” 2© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 23. “Engage NSF’s research community in the pursuit of fundamental research in data science and engineering, the development of a cohesive, federated, national-scale approach to research data infrastructure, and the development of a 21st-century data-capable workforce.” 3© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 24. Harnessing the Data Revolution: Five Themes Science domains Foundations Systems, Algorithms Cyber infrastructure Education, Workforce Research across all NSF Directorates Systems, algorithms data-centric algorithms, systems Data-intensive research in all areas of science and engineering Theoretical foundations mathematics, statistics, computer & computational science Educational pathways Innovations grounded in an education-research-based framework Accelerating data-intensive research Advanced cyberinfrastructure 4© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 25. Data- Intensive Science & Engineering Science Problems and Data Challenges… – Space weather: predictions requiring comprehensive synthesis of diverse sets of observations with state of the art modeling – Multi-messenger astrophysics: Utilizing data collected by the latest generation of observational facilities – ECO/GEO: Ecosystem forecasting: Understanding decadal-scale changes in ecosystems and the resources they provide; integrating data from disparate sources, e.g., NEON, LTER sites, NOAA satellites, EarthCube and other NSF programs… – DMR: Using and integrating data from multiple laboratories, techniques, and/or chemical systems can accelerate discovery of more efficient or selective catalysts. – Themes from other NSF Big Ideas requiring advances in data science… 6© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 26. HDR Institutes to Tackle Data Challenges in Data- Intensive Science and Engineering Research • E.g., Real-time data: Sensing, analysis, assimilation, decision-making with real-time data streams • E.g., Integration across multi-scale, multi-modal data, with data assimilation, to enable forecasting, predictions, … • All HDR activities are linked. Institutes must link with: – Foundations: TRIPODS Centers – Systems/Algorithms: Open Knowledge Network, Model Commons – Education and Workforce Development: HDR Academy, Data Science Corps Data- Intensive Science & Engineering 7© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 27. HDR Foundations NSF Transdisciplinary Research in Principles of Data Science (TRIPODS) Program – Establish collaborations among computer and computational scientists, statisticians, and mathematicians – Develop the principles of data science – as distinct from CS, statistics, and mathematics Foundations 5© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 28. HDR Systems and Algorithms • Open Knowledge Network • An open web-scale knowledge network of semantically- linked concepts and data • Would foster research on an entire class of new applications leveraging data, context, and inferences from data • Should support question/answer interfaces, dialog-based interactions, explanatory/story-telling interfaces • Model Commons – Enabling sharing and reuse of data-intensive models— including machine learning models – Provides support for reproducibility, transfer learning… Systems, Algorithms 6© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 29. 8 Education and Workforce Development: HDR Academy • Catalog, collect, create education/training materials – Collect data science education / training materials from HDR projects – Other related NSF projects – Other, non-NSF sources • HDR postdocs, undergrads – Place postdocs and REU students in “cross-training” positions e.g., domain scientist placed with a data group, and vice versa • HDR bootcamps – Offer data science bootcamps – for grad students; postdocs & junior faculty; senior faculty… Education, Workforce © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 30. Education, Workforce 9 Data Science Corps • Data Science Corps Workshop, Dec 7-8, 2017, McCourt School of Public Policy, Georgetown University, Washington DC • Link data science students and professionals to data science projects: – In academia, industry, government, and non-profits • Via capstone projects, summer internships, coop programs, study abroad programs, etc. • Focus especially on community colleges, 4-year colleges, MSIs, etc. Data Science Corps Workshop, December 7-8, 2017 DataScienceCorps Graduate programs Undergaduate programs 4-year colleges Community colleges Online programs Industry NGOs Volunteer Organizations Industry NGOs, e.g., Data Science for Social Good, DataKind, etc Universities, other research institutions Internatonal Organizations, e.g., WorldBank, UNICEF, ITU, Local / County / State / Federal Governments Projects in: • Basic research • Smart & Connected Communities • Health • Criminal Justice • Transportation, • Energy, • .. Project Organizations Students from Academic Programs Experts from Industry, NGOs Skills and expertise Varying levels of skills, expertise, and experiences © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 31. 10 Data Science Education-related Workshops Education, Workforce, Outreach  Data science education-related workshops  National Academies of Science Study/Report on Envisioning the Data Science Discipline: The Undergraduate Perspective, May 2018  NSF Workshop on Keeping Data Science Broad: Negotiating the Digital and Data Divide, Oct, 2017 © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 32. NSF BIGDATA: Critical Techniques, Technologies and Methodologies for Advancing Foundations and Applications of Big Data Sciences and Engineering • Three cloud providers Amazon Web Services, Google, Microsoft @ $3M each • IBM joined in 2018 @ $3M • Researchers may request cloud resources for their projects, within a min and max range. 11© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 33. BIGDATA Projects in 2017 using AWS 12© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. • Detecting Financial Market Manipulation: An Integrated Data- and Model-Driven Approach University of Michigan, Georgia Tech • Scalable and Interpretable machine learning: bridging mechanistic & data-driven modeling in biological sciences University of California, Berkeley • Taming Big Networks via Embedding University of Virginia, University of Illinois at Urbana-Champaign • Domain Adaptation Approaches for Classifying Crisis Related Data on Social Media Kansas State University, University of North Texas and Pennsylvania State University • Distributed Semi-Supervised Training of Deep Models and Its Applications in Video Understanding University of Central Florida Out of a total of 8 awards: 5 were awarded to researchers using AWS
  • 34. BIGDATA: Domain Adaptation Approaches for Classifying Crisis Related Data on Social Media Kansas State University PIs: Doina Caragea, Cornelia Caragea and Dan Andresen Pennsylvania State University PIs: Andrea Tapia, Jess Kropczynski © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 35. • iRF and iRF2.0 discover predictive and stable high- order interactions among variables, including biomolecules, for interpretations and follow- up experiments and studies • We study 750K human genotypes along with 30 years of medical records to learn the genetic architecture of depression, heart disease, and prostate cancer Iterative Random Forests (iRF) and iRF2.0: Genome- Wide Epistasis Studies (GWES) PI: Bin Yu co-PI: Ben Brown iRF © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Semi Supervised Semantic Segmentation Using Generative Adversarial Network (ICCV-2017) N. Souly, C. Spampinato and M. Shah Improving the Improved Training of Wasserstein GANs (CT-GAN): A Consistency Term and Its Dual Effect (ICLR-2018) X. Wei, B. Gong, Z. Liu, W. Lu, L. Wang  Problem: The gradient penalty 𝑮𝑷| 𝒙 often fails to check the continuity of region near the real data 𝒙. Illustration is shown to the right.  Our Approach: to alleviate the issue, we explicitly check the continuity condition by using two perturbation 𝒙′ , 𝒙′′ near any observed real data point 𝒙. Benefits from Amazon AWS  A pre-configured environment to easily build our deep learning programs.  4 weeks on our local two 1080 GPUs vs. 4 days on one AWS p3.16xlarge instance! An incredible speed! Semi-Supervised Training with Generative Adversarial Networks on AWS University of Central Florida, Drs. Mubarak Shah (PI) and Liqiang Wang (Co-PI)
  • 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you!