NIH Big Data to Knowledge (BD2K)
Upcoming SlideShare
Loading in...5

NIH Big Data to Knowledge (BD2K)



Why is the NIH investing $100M at the intersection of data science and health research? The NIH seeks to invest in ways to help researchers easily find, access, analyze, and curate research data. ...

Why is the NIH investing $100M at the intersection of data science and health research? The NIH seeks to invest in ways to help researchers easily find, access, analyze, and curate research data. Researchers want visual analytics, and to build the database into a “social network” – being able to “friend” or “like” the data.



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds


Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Why is the NIH investing $100M at the intersection of data science and health research?

NIH Big Data to Knowledge (BD2K) NIH Big Data to Knowledge (BD2K) Presentation Transcript

  • NIH – Big Data to Knowledge What is BD2K?  Why is NIH investing $100M in this?  For information about BD2K – click here   The following slides are highlights and notes from NIH workshop events *Information contained here belongs to the author and is not an official viewpoint of the NIH or any other organization
  • Drivers behind the BD2K grant To meet the emerging needs of the biomedical research community  To create a better research ecosystem  NIH seeks to invest in ways to help researchers easily find, access, analyze, and curate research data 
  • The Purpose of NIH’s Data Catalog Workshops To take steps, independently and in partnership with others, to enable a future state in which clinical data (including electronic health record data) are used effectively to conduct research and improve population health  Workshop participants engage actively in the discussions helping NIH develop plans, programs, and funding initiatives to implement BD2K 
  • Challenges Data sharing among biomedical researchers is lacking  There is no technical infrastructure for NIHfunded researchers to easily submit datasets associated with their work  Those datasets are not available to other researchers  There is little motivation to share data, since the most common current unit of academic credit is co-authorship in the peer-reviewed literature 
  • NIH’s Goals for BD2K To advance basic and translational science by facilitating and enhancing the sharing of research generated data  To promote the development of new analytical methods and software for this emerging data  To increase the workforce in quantitative science toward maximizing the return on the NIH’s public investment in biomedical research 
  • NIH’s Goals for BD2K  To improve the public’s ability to discover and access data resulting from federally funded research  Researchers want visual analytics, and to build the database into a ―social network‖ – being able to ―friend‖ or ―like‖ the data
  • The Model When the NIH created in collaboration with the Food and Drug Administration (FDA) and medical journals, the resource enabled clinical research investigators to track ongoing or completed trials. Subsequent requirements to enter outcome data have added to its value.  Establishing an analogous repository of molecular, phenotype, imaging, and other biomedical research data is of great value to the biomedical research community. 
  • NIH is looking for solutions  The development and implementation of analytical methods and software tools valuable to the research community follow a four-stage process.  Prototyping within the context of targeted scientific research projects  Engineering within robust software tools that provide appropriate user interfaces and data input/output features for effective community adoption and utilization  Dissemination to the research community — this process that may require the availability of appropriate data storage and computational resources  Maintenance and support is required to address users’ questions, community-driven requests for bug fixes, usability improvements, and new features
  • The Opportunity  The training of future data scientists is at stake  The creation of a platform for scientific communities to share data with citizen groups  A new science – new discoveries and relationships across data
  • NIH Data Catalog: Future Vision       Interoperation with other systems, interdisciplinary collaborations ―Likes‖ and cited metrics helping to find relevant datasets Non-obvious relationship discovery Journals imbed links within publications Enable learning: educational uses of data Return data to the community: patients too can access data
  • Search is Broken vs. Big Data    Documents are not just containers for keywords. Objects & meanings relate to people, documents, snippets, tweets, journals, doctors, caregivers, patients. Search is about the keywords and ignores everything else.
  • Academic Publishing vs. Open Access August 2013 – Univ. of California approved open access standards for research on all campuses.  2012 – Harvard Library urged its 2,100 faculty to boycott for-profit academic research databases and instead submit articles to lower-cost open access journals.  Also, the White House pledged $100 million to promote open access and to require all federally-funded research to be free of charge. 
  • Clinical Studies and Collaboration with Pharmaceutical Companies: in  The real-world population is rarely reflected the selected population of a single clinical trial data set. Combining and mining multiple data sets can produce a more holistic view, which is the standard that both patients and regulators expect therapies to be measured against.  Pharma companies need to embrace the challenge of using combined data sets to uncover insights they did not previously have.  This has the potential to benefit both the competing companies producing drugs and patients who will have improved outcomes.
  • Solutions Profile  There should be a system put in place by NIH/NLM for widespread sharing of data.  Feedback: ―we have the information, but we do not know how to use it.‖  A data system should be created to integrate data types, capture data, and create ―space‖ for raw data.
  • BD2K Overview Investing in technology and tools needed to enable researchers to easily find, access, analyze, and curate research data.  To increase the capacity of the workforce (both for experts and non-experts) and employ strategic planning to leverage IT advances for the entire NIH community.  Millions of Americans (citizen scientists) who may want to research their own disease history. 
  • The Citizen Scientist    1 million users/patients download their health data, much is unreadable. Mashups occur to build apps to read health records. The biomedical research community is within a few years of the ―thousand-dollar human genome needing a million-dollar interpretation.‖