NIH Big Data to Knowledge (BD2K)

NIH – Big Data to
Knowledge
What is BD2K?
 Why is NIH investing $100M in this?
 For information about BD2K – click here




The following slides are highlights and
notes from NIH workshop events

*Information contained here belongs to the author and is not an official viewpoint of the NIH or any other organization

Drivers behind the BD2K grant
To meet the emerging needs of the
biomedical research community
 To create a better research ecosystem
 NIH seeks to invest in ways to help
researchers easily find, access, analyze,
and curate research data


The Purpose of NIH’s Data
Catalog Workshops
To take steps, independently and in
partnership with others, to enable a
future state in which clinical data
(including electronic health record data)
are used effectively to conduct research
and improve population health
 Workshop participants engage actively
in the discussions helping NIH develop
plans, programs, and funding initiatives
to implement BD2K


Challenges
Data sharing among biomedical
researchers is lacking
 There is no technical infrastructure for NIHfunded researchers to easily submit
datasets associated with their work
 Those datasets are not available to other
researchers
 There is little motivation to share data,
since the most common current unit of
academic credit is co-authorship in the
peer-reviewed literature


NIH’s Goals for BD2K
To advance basic and translational
science by facilitating and enhancing the
sharing of research generated data
 To promote the development of new
analytical methods and software for this
emerging data
 To increase the workforce in quantitative
science toward maximizing the return on
the NIH’s public investment in
biomedical research


NIH’s Goals for BD2K
 To

improve the public’s ability to
discover and access data resulting
from federally funded research
 Researchers want visual analytics,
and to build the database into a
―social network‖ – being able to
―friend‖ or ―like‖ the data

The Model
When the NIH created ClinicalTrials.gov in
collaboration with the Food and Drug
Administration (FDA) and medical journals,
the resource enabled clinical research
investigators to track ongoing or completed
trials. Subsequent requirements to enter
outcome data have added to its value.
 Establishing an analogous repository of
molecular, phenotype, imaging, and other
biomedical research data is of great value
to the biomedical research community.


NIH is looking for solutions


The development and implementation of analytical
methods and software tools valuable to the research
community follow a four-stage process.
 Prototyping within the context of targeted scientific

research projects
 Engineering within robust software tools that provide
appropriate user interfaces and data input/output features
for effective community adoption and utilization
 Dissemination to the research community — this process
that may require the availability of appropriate data
storage and computational resources
 Maintenance and support is required to address users’
questions, community-driven requests for bug fixes,
usability improvements, and new features

The Opportunity
 The

training of future data scientists is
at stake
 The creation of a platform for
scientific communities to share data
with citizen groups
 A new science – new discoveries and
relationships across data

NIH Data Catalog: Future
Vision







Interoperation with other systems,
interdisciplinary collaborations
―Likes‖ and cited metrics helping to find
relevant datasets
Non-obvious relationship discovery
Journals imbed links within publications
Enable learning: educational uses of data
Return data to the community: patients too
can access data

Search is Broken vs. Big
Data






Documents are not
just containers for
keywords.
Objects & meanings
relate to people,
documents, snippets,
tweets, journals,
doctors, caregivers,
patients.
Search is about the
keywords and ignores
everything else.

www.ibm.com

Academic Publishing vs.
Open Access
August 2013 – Univ. of California approved
open access standards for research on all
campuses.
 2012 – Harvard Library urged its 2,100
faculty to boycott for-profit academic
research databases and instead submit
articles to lower-cost open access journals.
 Also, the White House pledged $100
million to promote open access and to
require all federally-funded research to be
free of charge.


Clinical Studies and
Collaboration with
Pharmaceutical Companies: in
 The real-world population is rarely reflected
the selected population of a single clinical trial
data set. Combining and mining multiple data
sets can produce a more holistic view, which is
the standard that both patients and regulators
expect therapies to be measured against.
 Pharma companies need to embrace the
challenge of using combined data sets to
uncover insights they did not previously have.
 This has the potential to benefit both the
competing companies producing drugs and
patients who will have improved outcomes.

Solutions Profile
 There

should be a system put in place
by NIH/NLM for widespread sharing
of data.
 Feedback: ―we have the information,
but we do not know how to use it.‖
 A data system should be created to
integrate data types, capture data,
and create ―space‖ for raw data.

BD2K Overview
Investing in technology and tools needed to
enable researchers to easily find, access,
analyze, and curate research data.
 To increase the capacity of the workforce
(both for experts and non-experts) and
employ strategic planning to leverage IT
advances for the entire NIH community.
 Millions of Americans (citizen scientists)
who may want to research their own
disease history.


The Citizen Scientist






1 million users/patients
download their health data,
much is unreadable.
Mashups occur to build apps
to read health records.
The biomedical research
community is within a few
years of the ―thousand-dollar
human genome needing a
million-dollar interpretation.‖

NIH Big Data to Knowledge (BD2K)

More Related Content

What's hot

Similar to NIH Big Data to Knowledge (BD2K)

Recently uploaded

NIH Big Data to Knowledge (BD2K)

Editor's Notes