The NIH as a Digital Enterprise:
Implications for PAG
Philip E. Bourne, PhD
Associate Director for Data Science
National Institutes of Health
PAG San Diego
January 11, 2015
What do we mean by the notion of a
Digital Enterprise?
Start by considering how far we have
come in just one researcher’s
career….
Biomedical Research is Becoming
More Digital and FAIR
 Finding
 Accessing
 Integrating
 Reusing
digital research objects
This move from an observational
science to a more analytical science
is being driven by ever increasing
amounts of digital data
The NIH Fire Hose Slide
And This May Just be the Beginning
 Evidence:
– Google car
– 3D printers
– Waze
– Robotics
From: The Second Machine Age: Work, Progress,
and Prosperity in a Time of Brilliant Technologies
by Erik Brynjolfsson & Andrew McAfee
Further Perturbation:
The Story of Meredith
http://fora.tv/2012/04/20/Congress_Unplugged_
Phil_Bourne
Stephen Friend
47/53 “landmark” publications
could not be replicated
[Begley, Ellis Nature,
483, 2012] [Carole Goble]
ADDS Mission
Statement
To foster an open ecosystem that
enables biomedical* research to be
conducted as a digital enterprise that
enhances health, lengthens life and
reduces illness and disability
* Includes biological, biomedical, behavioral, social,
environmental, and clinical studies that relate to understanding
health and disease.
Some Goals of the Digital Enterprise
 Cost savings through sharing of best
practices
 Sustainability of digital assets
 Collaboration through identification of
collaborators at the point of data collection
not publication
 Improved reproducibility through data and
methods sharing
 Integration of data types and data and
literature to accelerate discovery
Some of Today’s Observations
 Bad News
– We do not yet have a
data sustainability plan
– Global policies define the
why but not the how
– We do not know how all
the data we currently
have are used
– We can’t estimate future
supply and demand
– We need to ramp up
training programs in data
science
 Good news
– Genuine willingness to
address the problem
– Global communities are
emerging
– Efficiencies can be
achieved
– BD2K is the beginnings
of a plan
– We are beginning to
quantify the issues
Sustainability 101
Source Michael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830
What is the NIH Doing to Fulfill
That Promise?
Elements of The Digital Enterprise
Community
Policy
Infrastructure
• Sustainability
• Collaboration
• Training
Elements of The Digital Enterprise
Community
Policy
Infrastructure
• Sustainability
Collaboration
• Training
Virtuous
Research
Cycle
Policies – Now & Forthcoming
 Data Sharing
– Genomic data sharing announced
– Data sharing plans on all research awards
– Data sharing plan enforcement
• Machine readable plan
• Repository requirements to include grant numbers
http://www.nih.gov/news/health/aug2014/od-27.htm
Policies - Forthcoming
 Data Citation
– Goal: legitimize data as a form of scholarship
– Process:
• Machine readable standard for data citation (done)
• Endorsement of data citation for inclusion in NIH bib
sketch, grants, reports, etc.
• Example formats for human readable data citations
• Slowly work into NLM/NCBI workflow
BD2K
Center
BD2K
Center
BD2K
Center
BD2K
Center
BD2K
Center
BD2K
Center
DDICC
Software
Standard
s
Infrastructure - The
Commons
Labs
Labs
Labs
Labs
The Commons
Digital Objects
(with UIDs)
Search
(indexed metadata)
Computing
Platform
TheCommons
Vivien Bonazzi
George Komatsoulis
The Commons: Compute Platforms
The Commons
Conceptual Framework
Public Cloud
Platforms
Super Computing
(HPC) Platforms
Other
Platforms ?
 Google, AWS (Amazon)
 Microsoft (Azure), IBM,
other?
 In house compute
solutions
 Private clouds, HPC
– Pharma
– The Broad
– Bionimbus
 Traditionally low access
by NIH
The Commons:
Business Model
[George Komatsoulis]
How Might PAG’s Participate?
 Consider contributing digital research objects into the
Commons – data, software, standards, narrative,
course materials …
 Initiate your own moves from cylinders of excellence
to more integrated and multi-functional data sources
 Work to define new business models for the scientific
enterprise
Accelerate This Kind of Study
Pfenning et al 2014 Science 346 1333
Generic Needs
 Homogenization of disparate large unstructured
datasets
 Deriving structure from unstructured data
 Feature mapping and comparison from image data
 Visualization and analysis of multi-dimensional
phenotypic datasets
 Causal modeling of large scale dynamic networks
and subsequent discovery
 Utilize data that are sparsely and irregularly sampled
and noisy
BD2K will offer reference datasets and points of
domain expertise to explore these questions
1) Build an OPEN digital framework for data
science training:
NIH Data Science Workforce Development Center
1) Develop short-term training opportunities:
Courses, educational resources, etc.
1) Develop the discipline of biomedical data
science and support cross-training – OPEN
courseware
Community: Training
Data Science Training Goals
All goals have a diversity component and manate
Associate Director for Data Science
Commons BD2K Efficiency
Sustainability Education Innovation Process
• Cloud – Data &
Compute
• Search
• Security
• Reproducibility
Standards
• App Store
• Coordinate
• Hands-on
• Syllabus
• MOOCs
• Community
• Centers
• Training Grants
• Catalogs
• Standards
• Analysis
• Data
Resource
Support
• Metrics
• Best
Practices
• Evaluation
• Portfolio
Analysis
The Biomedical Research Digital Enterprise
Partnerships
Collaboration
rogrammatic Theme
Deliverable
Example Features • IC’s
• Researchers
• Federal
Agencies
• International
Partners
• Computer
Scientists
Scientific Data Council External Advisory Board
Training
NIHNIH……
Turning Discovery Into HealthTurning Discovery Into Health
philip.bourne@nih.gov
Potential Outcomes
 Mobility: improve the outcomes of surgeries in
children with cerebral palsy and gait pathology
 Wellness: markers derived from constantly monitored
eHealth/mobile health devices – apply to smoking
cessation, weight loss
 Cancer: further personalization of treatment
 Mental Health: better identify factors that resist and
promote brain disease e.g., schizophrenia, bipolar
disorder, major depression, attention deficit
hyperactivity disorder (ADHD), obsessive compulsive
disorder (OCD), autism
 Addiction: utilizing social media to track and treat
drug use and addiction

The NIH as a Digital Enterprise: Implications for PAG

  • 1.
    The NIH asa Digital Enterprise: Implications for PAG Philip E. Bourne, PhD Associate Director for Data Science National Institutes of Health PAG San Diego January 11, 2015
  • 2.
    What do wemean by the notion of a Digital Enterprise?
  • 3.
    Start by consideringhow far we have come in just one researcher’s career….
  • 5.
    Biomedical Research isBecoming More Digital and FAIR  Finding  Accessing  Integrating  Reusing digital research objects
  • 6.
    This move froman observational science to a more analytical science is being driven by ever increasing amounts of digital data
  • 7.
    The NIH FireHose Slide
  • 8.
    And This MayJust be the Beginning  Evidence: – Google car – 3D printers – Waze – Robotics From: The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson & Andrew McAfee
  • 9.
    Further Perturbation: The Storyof Meredith http://fora.tv/2012/04/20/Congress_Unplugged_ Phil_Bourne Stephen Friend
  • 10.
    47/53 “landmark” publications couldnot be replicated [Begley, Ellis Nature, 483, 2012] [Carole Goble]
  • 11.
    ADDS Mission Statement To fosteran open ecosystem that enables biomedical* research to be conducted as a digital enterprise that enhances health, lengthens life and reduces illness and disability * Includes biological, biomedical, behavioral, social, environmental, and clinical studies that relate to understanding health and disease.
  • 12.
    Some Goals ofthe Digital Enterprise  Cost savings through sharing of best practices  Sustainability of digital assets  Collaboration through identification of collaborators at the point of data collection not publication  Improved reproducibility through data and methods sharing  Integration of data types and data and literature to accelerate discovery
  • 13.
    Some of Today’sObservations  Bad News – We do not yet have a data sustainability plan – Global policies define the why but not the how – We do not know how all the data we currently have are used – We can’t estimate future supply and demand – We need to ramp up training programs in data science  Good news – Genuine willingness to address the problem – Global communities are emerging – Efficiencies can be achieved – BD2K is the beginnings of a plan – We are beginning to quantify the issues
  • 14.
    Sustainability 101 Source MichaelBell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830
  • 15.
    What is theNIH Doing to Fulfill That Promise?
  • 16.
    Elements of TheDigital Enterprise Community Policy Infrastructure • Sustainability • Collaboration • Training
  • 17.
    Elements of TheDigital Enterprise Community Policy Infrastructure • Sustainability Collaboration • Training Virtuous Research Cycle
  • 18.
    Policies – Now& Forthcoming  Data Sharing – Genomic data sharing announced – Data sharing plans on all research awards – Data sharing plan enforcement • Machine readable plan • Repository requirements to include grant numbers http://www.nih.gov/news/health/aug2014/od-27.htm
  • 19.
    Policies - Forthcoming Data Citation – Goal: legitimize data as a form of scholarship – Process: • Machine readable standard for data citation (done) • Endorsement of data citation for inclusion in NIH bib sketch, grants, reports, etc. • Example formats for human readable data citations • Slowly work into NLM/NCBI workflow
  • 20.
  • 21.
    The Commons Digital Objects (withUIDs) Search (indexed metadata) Computing Platform TheCommons Vivien Bonazzi George Komatsoulis
  • 22.
    The Commons: ComputePlatforms The Commons Conceptual Framework Public Cloud Platforms Super Computing (HPC) Platforms Other Platforms ?  Google, AWS (Amazon)  Microsoft (Azure), IBM, other?  In house compute solutions  Private clouds, HPC – Pharma – The Broad – Bionimbus  Traditionally low access by NIH
  • 23.
  • 24.
    How Might PAG’sParticipate?  Consider contributing digital research objects into the Commons – data, software, standards, narrative, course materials …  Initiate your own moves from cylinders of excellence to more integrated and multi-functional data sources  Work to define new business models for the scientific enterprise
  • 25.
    Accelerate This Kindof Study Pfenning et al 2014 Science 346 1333
  • 26.
    Generic Needs  Homogenizationof disparate large unstructured datasets  Deriving structure from unstructured data  Feature mapping and comparison from image data  Visualization and analysis of multi-dimensional phenotypic datasets  Causal modeling of large scale dynamic networks and subsequent discovery  Utilize data that are sparsely and irregularly sampled and noisy BD2K will offer reference datasets and points of domain expertise to explore these questions
  • 27.
    1) Build anOPEN digital framework for data science training: NIH Data Science Workforce Development Center 1) Develop short-term training opportunities: Courses, educational resources, etc. 1) Develop the discipline of biomedical data science and support cross-training – OPEN courseware Community: Training Data Science Training Goals All goals have a diversity component and manate
  • 28.
    Associate Director forData Science Commons BD2K Efficiency Sustainability Education Innovation Process • Cloud – Data & Compute • Search • Security • Reproducibility Standards • App Store • Coordinate • Hands-on • Syllabus • MOOCs • Community • Centers • Training Grants • Catalogs • Standards • Analysis • Data Resource Support • Metrics • Best Practices • Evaluation • Portfolio Analysis The Biomedical Research Digital Enterprise Partnerships Collaboration rogrammatic Theme Deliverable Example Features • IC’s • Researchers • Federal Agencies • International Partners • Computer Scientists Scientific Data Council External Advisory Board Training
  • 29.
    NIHNIH…… Turning Discovery IntoHealthTurning Discovery Into Health philip.bourne@nih.gov
  • 30.
    Potential Outcomes  Mobility:improve the outcomes of surgeries in children with cerebral palsy and gait pathology  Wellness: markers derived from constantly monitored eHealth/mobile health devices – apply to smoking cessation, weight loss  Cancer: further personalization of treatment  Mental Health: better identify factors that resist and promote brain disease e.g., schizophrenia, bipolar disorder, major depression, attention deficit hyperactivity disorder (ADHD), obsessive compulsive disorder (OCD), autism  Addiction: utilizing social media to track and treat drug use and addiction

Editor's Notes

  • #11 Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8): e124. doi:10.1371/journal.pmed.0020124 http://www.reuters.com/article/2012/03/28/us-science-cancer-idUSBRE82R12P20120328