Introduction to Big Data and its
Potential for Dementia Research

David De Roure
Overview

• What do we mean by Big Data?
• Role in medical research
• Impact on future research
• Application to dementia research
• Challenges and issues
...the imminent flood of
 scientific data expected
 from the next generation of
 experiments, simulations,
 sensors and satellites


   Source: CERN, CERN-EX-0712023, http://cdsweb.cern.ch/record/1203203
http://www.slideshare.net/RockHealth/rock-report-big-data
Linked Data                      Investment is worthwhile
                                  when the data is:
                                  • Discoverable
1. Use URIs as names for things   • Reusable
2. Use HTTP URIs so that          • Linkable
   people can look up those
   names
3. When someone looks up
   a URI, provide useful
   information, using the
   standards
4. Include links to other URIs
   so that they can discover
   more things
http://www.ajtmh.org/content/86/1/39.full
http://stanmed.stanford.edu/2012summer/
http://www.slideshare.net/RockHealth/rock-report-big-data
BioEssays,, 26(1):99–105, January 2004




                                         http://research.microsoft.com/en-us/collaboration/fourthparadigm/
A Big Picture
                e-infrastructure
More machines



                Big Data                  The Future!
                Big Compute

                Conventional              Social
                                                        online
                Computation               Networking    R&D




                                   More people
method



 data
Some Social Machines




             Nigel Shadbolt
Notifications of new data and results,
          automatic re-runs of analysis pipelines


                                  New research?
               Autonomic
                Curation
                                           Self-repair
  • Automation assists the scientist
  • Use the computational capability
  • Scale the capability to the problem,
    not the problem to the desktop


Machines are users too
Massachusetts General Hospital (MGH) will receive $5.4
million from the nonprofit Cure Alzheimer’s Fund, in what the
fund said was the largest single private scientific grant ever
invested in Alzheimer’s whole-genome sequencing focused
on families with the disease.

Over the next 12 to 18 months, the Alzheimer’s Genome
Project will obtain complete genomic sequences of more than
1,500 patients in families that have Alzheimer’s, and will
include over 100 brain samples. The genomes of family
members with Alzheimer’s will be compared to those
members who have been spared the disease to identify sites
in the genome that influence risk for Alzheimer’s.
                                                                                               Tim Clark
http://www.genengnews.com/gen-news-highlights/mgh-wins-5-4m-grant-toward-sequencing-for-alzheimer-s-risk/81247502/
Significant added value through
appropriate additional data collection
Big Data
                    methodology




www.methodbox.org
Troublesome Threes
• 3 Ingredients
   – Data; Models; Expertise            Challenge
                                        conventional
                                        assumptions
• 3 Myths
   – Big data warehouses are the solution
   – Science provides the models to utilise the data
   – Clinicians will continue to be the main source of data

• 3 Pipelines
   – R&D; Quality Improvement; Payor & Public Health


                                                       Iain Buchan
Big Data in Context



                        or




Datasets
(+ models)                   Data Models Expertise
(searched by experts)         “sense-making network”

                                           Iain Buchan
Closing thoughts
1. Big Data is not just a quantitative change, it’s a
    methodological change – using digital methods
   • Use what we already have (in silos)
2. Tremendous opportunity to collect additional data
    with significant impact on dementia research
   • Surveys and social machines
   • Data from instrumenting care process today
3. Think sociotechnical – community matters
   • Method sharing, and usage adds value
   • Machines are users too – assistance vs automation
david.deroure@oerc.ox.ac.uk
www.oerc.ox.ac.uk/people/dder
www.scilogs.com/eresearch
@dder
Personal slide credits: Nigel Shadbolt, Tim Clark, Iain Buchan

Introduction to Big Data and its Potential for Dementia Research

  • 1.
    Introduction to BigData and its Potential for Dementia Research David De Roure
  • 2.
    Overview • What dowe mean by Big Data? • Role in medical research • Impact on future research • Application to dementia research • Challenges and issues
  • 3.
    ...the imminent floodof scientific data expected from the next generation of experiments, simulations, sensors and satellites Source: CERN, CERN-EX-0712023, http://cdsweb.cern.ch/record/1203203
  • 5.
  • 6.
    Linked Data Investment is worthwhile when the data is: • Discoverable 1. Use URIs as names for things • Reusable 2. Use HTTP URIs so that • Linkable people can look up those names 3. When someone looks up a URI, provide useful information, using the standards 4. Include links to other URIs so that they can discover more things
  • 8.
  • 9.
  • 10.
  • 11.
    BioEssays,, 26(1):99–105, January2004 http://research.microsoft.com/en-us/collaboration/fourthparadigm/
  • 12.
    A Big Picture e-infrastructure More machines Big Data The Future! Big Compute Conventional Social online Computation Networking R&D More people
  • 13.
  • 14.
    Some Social Machines Nigel Shadbolt
  • 15.
    Notifications of newdata and results, automatic re-runs of analysis pipelines New research? Autonomic Curation Self-repair • Automation assists the scientist • Use the computational capability • Scale the capability to the problem, not the problem to the desktop Machines are users too
  • 16.
    Massachusetts General Hospital(MGH) will receive $5.4 million from the nonprofit Cure Alzheimer’s Fund, in what the fund said was the largest single private scientific grant ever invested in Alzheimer’s whole-genome sequencing focused on families with the disease. Over the next 12 to 18 months, the Alzheimer’s Genome Project will obtain complete genomic sequences of more than 1,500 patients in families that have Alzheimer’s, and will include over 100 brain samples. The genomes of family members with Alzheimer’s will be compared to those members who have been spared the disease to identify sites in the genome that influence risk for Alzheimer’s. Tim Clark http://www.genengnews.com/gen-news-highlights/mgh-wins-5-4m-grant-toward-sequencing-for-alzheimer-s-risk/81247502/
  • 17.
    Significant added valuethrough appropriate additional data collection
  • 18.
    Big Data methodology www.methodbox.org
  • 20.
    Troublesome Threes • 3Ingredients – Data; Models; Expertise Challenge conventional assumptions • 3 Myths – Big data warehouses are the solution – Science provides the models to utilise the data – Clinicians will continue to be the main source of data • 3 Pipelines – R&D; Quality Improvement; Payor & Public Health Iain Buchan
  • 21.
    Big Data inContext or Datasets (+ models) Data Models Expertise (searched by experts) “sense-making network” Iain Buchan
  • 22.
    Closing thoughts 1. BigData is not just a quantitative change, it’s a methodological change – using digital methods • Use what we already have (in silos) 2. Tremendous opportunity to collect additional data with significant impact on dementia research • Surveys and social machines • Data from instrumenting care process today 3. Think sociotechnical – community matters • Method sharing, and usage adds value • Machines are users too – assistance vs automation
  • 23.

Editor's Notes

  • #13 Big Data and Big Compute and Big Society!Look at astronomy for exampleDifferent rates of progress along axes – one futurological theory says we need a lot more machine to assist because machines scale further than people