Data Analytics
Philip E. Bourne, PhD, FACMI
Associate Director for Data Science
National Institutes of Health
UM iSchool
November 3, 2015
Pre-reading
Health Informatics: Practical Guide for
Healthcare and Information
Technology Professionals
Chapter 3 Healthcare Data Analytics
William Hersh
This is a Conversation NOT a Lecture
We Have Been Successful
World Climate Report 2011
http://www.cnet.com/news/china-unseats-u-s-in-supercomputer-ranking/
Why Now?
Harnessing Data to Improve Health:
BD2K (Big Data to Knowledge)
NIH’s 6-year initiative to use data science to foster an
open digital ecosystem that will accelerate efficient,
cost-effective biomedical research to enhance health,
lengthen life, and reduce illness and disability
Programs and activities:
Advance discovery for biomedical research
Facilitate use and re-use of biomedical data
Develop analytical methods and software
Enhance biomedical data science training
Big Data in the Life Sciences …
This speaks to something more
fundamental that more data …
It speaks to new methodologies, new
skills, new emphasis, new cultures,
new modes of discovery …
The History of Computational
Biomedicine According to Bourne
1980s 1990s 2000s 2010s 2020
Discipline:
Unknown Expt. Driven Emergent Over-sold A Service A Partner A Driver
The Raw Material:
Non-existent Limited /Poor More/Ontologies Big Data/Siloed Open/Integrated
The People:
No name Technicians Industry recognition data scientists Academics
Searls (ed) The Roots in Bioinformatics Series PLOS Comp Biol
Consider what the expert
prophets are saying …
We are at a Point of Deception …
 Evidence:
– Google car
– 3D printers
– Waze
– Robotics
– Sensors
From: The Second Machine Age: Work, Progress,
and Prosperity in a Time of Brilliant Technologies
by Erik Brynjolfsson & Andrew McAfee
Example - Photography
Digitization
Deception
Disruption
Demonetization
Dematerialization
Democratization
Time
Volume,Velocity,Variety
Digital camera invented by
Kodak but shelved
Megapixels & quality improve slowly;
Kodak slow to react
Film market collapses;
Kodak goes bankrupt
Phones replace
cameras
Instagram,
Flickr become the
value proposition
Digital media becomes bona fide
form of communication
We Are At a Point of Deception
The 6D Exponential Framework
Digitization of Basic &
Clinical Research & EHR’s
Deception
We Are Here
Disruption
Demonetization
Dematerialization
Democratization
Open science
Patient centered health care
What Are Some General Implications
of Such a Future?
 Open collaborative science becomes of increasing
importance
 The value of data and associated analytics becomes
of increasing value to scholarship
 Opportunities exist to improve the efficiency of the
research enterprise and hence fund more research
 Current training content and modalities will not match
supply to demand
 Balancing accessibility vs security becomes more
important yet more complex
An Example of That Promise:
Comorbidity Network for 6.2M Danes
Over 14.9 Years
Jensen et al 2014 Nat Comm 5:4022
“And that’s why we’re here today. Because something
called precision medicine … gives us one of the greatest
opportunities for new medical breakthroughs that we
have ever seen.”
President Barack Obama
January 30, 2015
Precision Medicine Initiative
 National Research Cohort
– >1 million U.S. volunteers
– Numerous existing cohorts (many funded by NIH)
– New volunteers
 Participants will be centrally involved in design and
implementation of the cohort
 They will be able to share genomic data, lifestyle
information, biological samples – all linked to their
electronic health records
Center of Excellence for Mobile
Sensor Data-to-Knowledge (MD2K)
Santosh Kumar, Ph.D.
Director, MD2K Center of Excellence
Professor & Moss Chair of Excellence in Computer Science
University of Memphis
https://datascience.nih.gov/bd2k/funded-programs/centers
MD2K Applications – CHF and Smoking
Strategic
Areas
Sustainability
Workforce
Development
& Diversity
Discovery &
Innovation
Policy &
Process
Leadership
Research Objects in the Commons
Voxel Wide Genome Scanning
MRI standardization
Over 100 Public Lectures
Collaboration with a Minority Institution
185 Institutions Involved
Genomic Data Sharing
Policy
Example: BD2K Center
Working Across Strategic Areas
BD2K Targeted Software Topics
Supports innovative analytical methods and software tools
that address critical current and emerging needs of the
biomedical research
2015 Topics (18 awards, U01s)
– Data Compression
– Data Provenance
– Data Visualization
– Data Wrangling
2016 Topics (U01s, under review)
– Data Privacy
– Data Repurposing
– Applying Metadata
– 2016: Crowdsourcing and interactive Digital Media
(UH2)
Goal: To strengthen the ability of a
diverse biomedical workforce to develop
and benefit from data science
Key Chapter Points
 Provenance
 Proof of the value of analytics is still forming
 Work force - shortage
I not only use all the brains
I have, but all I can borrow.
– Woodrow Wilson
The Team
27
NIHNIH……
Turning Discovery Into HealthTurning Discovery Into Health
philip.bourne@nih.gov
https://datascience.nih.gov/
http://www.ncbi.nlm.nih.gov/research/staff/bourne/

Data Analytics

  • 1.
    Data Analytics Philip E.Bourne, PhD, FACMI Associate Director for Data Science National Institutes of Health UM iSchool November 3, 2015
  • 2.
    Pre-reading Health Informatics: PracticalGuide for Healthcare and Information Technology Professionals Chapter 3 Healthcare Data Analytics William Hersh
  • 3.
    This is aConversation NOT a Lecture
  • 4.
    We Have BeenSuccessful World Climate Report 2011 http://www.cnet.com/news/china-unseats-u-s-in-supercomputer-ranking/
  • 5.
  • 6.
    Harnessing Data toImprove Health: BD2K (Big Data to Knowledge) NIH’s 6-year initiative to use data science to foster an open digital ecosystem that will accelerate efficient, cost-effective biomedical research to enhance health, lengthen life, and reduce illness and disability Programs and activities: Advance discovery for biomedical research Facilitate use and re-use of biomedical data Develop analytical methods and software Enhance biomedical data science training
  • 7.
    Big Data inthe Life Sciences … This speaks to something more fundamental that more data … It speaks to new methodologies, new skills, new emphasis, new cultures, new modes of discovery …
  • 8.
    The History ofComputational Biomedicine According to Bourne 1980s 1990s 2000s 2010s 2020 Discipline: Unknown Expt. Driven Emergent Over-sold A Service A Partner A Driver The Raw Material: Non-existent Limited /Poor More/Ontologies Big Data/Siloed Open/Integrated The People: No name Technicians Industry recognition data scientists Academics Searls (ed) The Roots in Bioinformatics Series PLOS Comp Biol
  • 9.
    Consider what theexpert prophets are saying …
  • 10.
    We are ata Point of Deception …  Evidence: – Google car – 3D printers – Waze – Robotics – Sensors From: The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson & Andrew McAfee
  • 11.
    Example - Photography Digitization Deception Disruption Demonetization Dematerialization Democratization Time Volume,Velocity,Variety Digitalcamera invented by Kodak but shelved Megapixels & quality improve slowly; Kodak slow to react Film market collapses; Kodak goes bankrupt Phones replace cameras Instagram, Flickr become the value proposition Digital media becomes bona fide form of communication
  • 12.
    We Are Ata Point of Deception The 6D Exponential Framework Digitization of Basic & Clinical Research & EHR’s Deception We Are Here Disruption Demonetization Dematerialization Democratization Open science Patient centered health care
  • 13.
    What Are SomeGeneral Implications of Such a Future?  Open collaborative science becomes of increasing importance  The value of data and associated analytics becomes of increasing value to scholarship  Opportunities exist to improve the efficiency of the research enterprise and hence fund more research  Current training content and modalities will not match supply to demand  Balancing accessibility vs security becomes more important yet more complex
  • 14.
    An Example ofThat Promise: Comorbidity Network for 6.2M Danes Over 14.9 Years Jensen et al 2014 Nat Comm 5:4022
  • 16.
    “And that’s whywe’re here today. Because something called precision medicine … gives us one of the greatest opportunities for new medical breakthroughs that we have ever seen.” President Barack Obama January 30, 2015
  • 17.
    Precision Medicine Initiative National Research Cohort – >1 million U.S. volunteers – Numerous existing cohorts (many funded by NIH) – New volunteers  Participants will be centrally involved in design and implementation of the cohort  They will be able to share genomic data, lifestyle information, biological samples – all linked to their electronic health records
  • 19.
    Center of Excellencefor Mobile Sensor Data-to-Knowledge (MD2K) Santosh Kumar, Ph.D. Director, MD2K Center of Excellence Professor & Moss Chair of Excellence in Computer Science University of Memphis https://datascience.nih.gov/bd2k/funded-programs/centers
  • 20.
    MD2K Applications –CHF and Smoking
  • 21.
    Strategic Areas Sustainability Workforce Development & Diversity Discovery & Innovation Policy& Process Leadership Research Objects in the Commons Voxel Wide Genome Scanning MRI standardization Over 100 Public Lectures Collaboration with a Minority Institution 185 Institutions Involved Genomic Data Sharing Policy Example: BD2K Center Working Across Strategic Areas
  • 22.
    BD2K Targeted SoftwareTopics Supports innovative analytical methods and software tools that address critical current and emerging needs of the biomedical research 2015 Topics (18 awards, U01s) – Data Compression – Data Provenance – Data Visualization – Data Wrangling 2016 Topics (U01s, under review) – Data Privacy – Data Repurposing – Applying Metadata – 2016: Crowdsourcing and interactive Digital Media (UH2)
  • 23.
    Goal: To strengthenthe ability of a diverse biomedical workforce to develop and benefit from data science
  • 24.
    Key Chapter Points Provenance  Proof of the value of analytics is still forming  Work force - shortage
  • 26.
    I not onlyuse all the brains I have, but all I can borrow. – Woodrow Wilson
  • 27.
  • 28.
    NIHNIH…… Turning Discovery IntoHealthTurning Discovery Into Health philip.bourne@nih.gov https://datascience.nih.gov/ http://www.ncbi.nlm.nih.gov/research/staff/bourne/

Editor's Notes

  • #7 Updated by ADDS group 8/25/15
  • #15 16 million hospital inpatient events (24.5% of total), 35 million outpatient clinic events (53.6% of total) and 14 million emergency department events (21.9% of total
  • #17 Photos: FC tweet; RK screen grab
  • #18 Images of people from Infographic (NOTE: Image is just a placeholder—Jill will tweak) Detailed Notes: National Research Cohort <<OR name of study>> >1 million U.S. volunteers committed to participating in research Will combine a number of existing cohorts Will include Dept of Veterans Affairs Million Veteran Program—note Veteran is singular per http://www.research.va.gov/MVP/
  • #22 Detected 8 genetic variants influencing volume of brain structures to provide insight into brain development and neuropsychiatric dysfunction. MRI images from >30,000 people Meta-analysis of GWAS data from >13,000 people Replicated results with data from >17,000 people Designed standardized protocols for image analysis, quality assessment, genetic imputation, and association. Developed 3D models for 1,500 subjects Used freely available software for measurements
  • #24 Short term: produce a searchable catalog of physical and virtual courses; Funding diversity awards to work with BD2K Centers; Expand IRP training started Jan 2015 e.g. Software carpentry and Train the trainers Long term: evaluation