Enabling simultaneous analysis of multiple
cohort studies without accessing the full
dataset: a BRISSKit use case
Dr Jonat...
http://www.astrogrid.org
(April 2008 1st public release)
Data Reuse: asking new questions

•

Hubble Space Telescope
Papers based upon reuse of archived observations now exceed th...
Why open? an Open Enterprise Report
Science as
•

•

•

•

As a first step towards this intelligent
openness, data that un...
BRISSKit

context:

The I4Health goal of applying knowledge engineering to close the
‘ICT gap’ between research and health...
Biomedical Research Infrastructure Software Service Kit
A vision for cloud-based open source research applications
#BRISSK...
http://www.brisskit.le.ac.uk
BRISSKit USPs












Integrated support for core research processes

Well-established mature open source applic...
www.brisskit.le.ac.uk
Email: brisskit@le.ac.uk
BRISSKit Community & Hack Event, Oct 2012
http://www.brisskit.le.ac.uk/node/35
BRISSKit Information Governance
& Security Management Work Stream
- Dr Andrew Burnham leading
1.

Information Governance T...
The semantic bridge
OBiBa Onyx
Records participant
consent, questionnaire
data and primary
specimen IDs

Bio-ontology!

i2...
BRISSKit and Bio Banking
• Deploy solutions in international bio banking
initiatives
• Investment through Prof Paul Burton...
Contemporary biobanking:
meeting the “data”
challenge
Large data sets, why bother?

•Sample size
•Depth of phenotyping
•Quality of measurement
All critical
How big is BIG?
 The direct effect of a gene
• 2,000 cases minimum, 10,000 cases better

 Environmental and life-style f...
The bottom line

• Scientific harmonization
• Restriction on access to individual level data
• Streamlined access to multi...
Horizontally partitioned data
Data
Data

1958BC

Data
KORAGEN

PREVEND

Data
FINRISK

Joint central
analysis

How can we u...
DataSHIELD: a novel solution
Take analysis to data not
data to analysis
One step analyses: simple
Iterative analyses: para...
Horizontal
DataSHIELD
Data computer

Opal
Finrisk

R
Data computer
Opal
Prevend

Web services

Web services BioSHaRE Web s...
Horizontal
DataSHIELD
Data computer

Opal includes
• DataSHIELD
• DataSHaPER
• Researcher ID

Opal
Finrisk

R
Data compute...
Horizontal
DataSHIELD
Work in progress:
• Embed Opal in
BRISSKit
Data computer

• ALSPAC
• MRC e-HIRCS
• +more…

Opal
Finr...
Opal
1958BC
Opal
1958BC

BRISSKit gains
• DataSHIELD
• DataSHaPER
• Researcher ID
Opal gains
• Direct interface
with more tools
• I2B2 functionality
•Potential for enhanced
user interface

Opal
1958BC

BR...
Opal gains
• Direct interface
with more tools
• I2B2 functionality
•Potential for enhanced
user interface

Opal
1958BC

BR...
Opal gains
• Direct interface
with more tools
• I2B2 functionality
•Potential for enhanced
user interface

Opal
1958BC

BR...
The bottom line

• Scientific harmonization
• Restriction on access to individual level data
• Streamlined access to multi...
Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case
Upcoming SlideShare
Loading in …5
×

Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case

409 views

Published on

A description of BRISSKit, an open source tool that may be used to combine datasets held in different locations and analyse them for the purpose of research. Talk give by Jonathan Tedds of Leicester Uni. for the Data Management in Practice workshop, which took place on Nov 14th 2013 at the London School of Hygiene and Tropical Medicine

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
409
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
3
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Hubble Space Telescope (HST) in operation since 1990Observations are made on the basis of prposals, data is collected and made available to the proposers; data is stored at the Space Telescope Science Institute and made available after an embargo.Each year approx 200 proposals are selected from a field of 1,000; leading to c. 20,000 individual observationsThere are now more research papers published on the basis of ‘reuse’ of the archived data than those based on the use described in the original proposal.
  • SAOE report looked at the changing conduct of science.Key recommendations are the research data, the data underpinning research findings should be as openly available as possible.
  • Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case

    1. 1. Enabling simultaneous analysis of multiple cohort studies without accessing the full dataset: a BRISSKit use case Dr Jonathan Tedds jat26@le.ac.uk @jtedds Senior Research Fellow, Health Informatics & Interdisciplinary Research Group, Department of Health Sciences (University of Leicester) PI #BRISSKit http://www.brisskit.le.ac.uk
    2. 2. http://www.astrogrid.org (April 2008 1st public release)
    3. 3. Data Reuse: asking new questions • Hubble Space Telescope Papers based upon reuse of archived observations now exceed those based on the use described in the original proposal. – http://archive.stsci.edu/hst/bibliography/pubstat.html • See also work by Piwowar & Vision re life sciences: “Data reuse and the open data citation advantage” – http://peerj.com/preprints/1/
    4. 4. Why open? an Open Enterprise Report Science as • • • • As a first step towards this intelligent openness, data that underpin a journal article should be made concurrently available in an accessible database We are now on the brink of an achievable aim: for all science literature to be online, for all of the data to be online and for the two to be interoperable. [p.7] Royal Society June 2012, Science as an Open Enterprise, http://royalsociety.org/policy/projects/science -public-enterprise/report/ Issues linking data to the scientific record: – – – • Data persistence Data and metadata quality Attribution and credit for data producers Geoffrey Boulton (Edinburgh), Lead author: – “Science has been sleepwalking into crisis of replicability...and of the credibility of science” – “Publishing articles without making the data available is scientific malpractice”
    5. 5. BRISSKit context: The I4Health goal of applying knowledge engineering to close the ‘ICT gap’ between research and healthcare (Beck, T. et al 2012)
    6. 6. Biomedical Research Infrastructure Software Service Kit A vision for cloud-based open source research applications #BRISSKit http://www.brisskit.le.ac.uk
    7. 7. http://www.brisskit.le.ac.uk
    8. 8. BRISSKit USPs       Integrated support for core research processes Well-established mature open source applications as protoyped in Cardiovascular, Respiratory, Cancer Theme Biobank: UK customised A platform for seamless management and integration between applications An API allows integration with existing clinical systems Easy set up, use and administration through browser (including on mobile devices) Capability of being hosted in any compliant cloud provider including UHL (NHS information governance)
    9. 9. www.brisskit.le.ac.uk Email: brisskit@le.ac.uk
    10. 10. BRISSKit Community & Hack Event, Oct 2012 http://www.brisskit.le.ac.uk/node/35
    11. 11. BRISSKit Information Governance & Security Management Work Stream - Dr Andrew Burnham leading 1. Information Governance Toolkit - analysis of Department of Health (DoH/NHS) IGT requirements vs. BRISSKit organisation/project and services/tools a) Hosted Secondary Use Team/project (Hosted IGT) b) Acute Trust (Acute Trust IGT) 2. IG Training Tool (NHS – University is registered) 3. Pseudonymisation requirements 4. Data Management Plan 5. IT Security & standards – Penetration Testing & Security Testing 6. Other NHS Standards/Requirements: - Care Records Guarantee - NHS Constitution - NHS Records Management - Patient Safety DSCN 14/2009, 18/2009
    12. 12. The semantic bridge OBiBa Onyx Records participant consent, questionnaire data and primary specimen IDs Bio-ontology! i2b2 Cohort selection and data querying ?
    13. 13. BRISSKit and Bio Banking • Deploy solutions in international bio banking initiatives • Investment through Prof Paul Burton (Health Sciences at Leicester/Bristol) & international collaborations • Building on strong informatics expertise at University of Leicester in partnership with the University Hospitals Leicester Trust • Cardiovascular, Respiratory & Lifestyle BRUs • Cancer Theme Biobank • Genomics etc
    14. 14. Contemporary biobanking: meeting the “data” challenge
    15. 15. Large data sets, why bother? •Sample size •Depth of phenotyping •Quality of measurement All critical
    16. 16. How big is BIG?  The direct effect of a gene • 2,000 cases minimum, 10,000 cases better  Environmental and life-style factors • Highly context specific: from hundreds to tens of thousands of cases  Gene-lifestyle and gene-gene “interactions” • Absolute minimum 10,000, usually need at least 20,000, a comprehensive platform needs at least 50,000 • Scientifically fundamental
    17. 17. The bottom line • Scientific harmonization • Restriction on access to individual level data • Streamlined access to multiple data sets  Central to the integrative aims of P3G, PHOEBE, BioSHaRE-eu etc  Also fundamental to the aims of potential BRISSKit users 18  Effective data access is crucial  Effective joint analysis is essential too (integration)  Fundamental challenges
    18. 18. Horizontally partitioned data Data Data 1958BC Data KORAGEN PREVEND Data FINRISK Joint central analysis How can we undertake a full joint analysis using multiple data sources if the data cannot physically be pooled?  Ethico-legal constraints  Physical size of the data objects  Intellectual property issues
    19. 19. DataSHIELD: a novel solution Take analysis to data not data to analysis One step analyses: simple Iterative analyses: parallel processes linked together by entirely non-identifying summary statistics Typically produces mathematically identical results to fitting a single model to all the data held in one pooled data set
    20. 20. Horizontal DataSHIELD Data computer Opal Finrisk R Data computer Opal Prevend Web services Web services BioSHaRE Web services web site R Web services R Analysis Computer Data computer Opal 1958BC R
    21. 21. Horizontal DataSHIELD Data computer Opal includes • DataSHIELD • DataSHaPER • Researcher ID Opal Finrisk R Data computer Opal Prevend Web services Web services BioSHaRE Web services web site R Web services R Analysis Computer Data computer Opal 1958BC R
    22. 22. Horizontal DataSHIELD Work in progress: • Embed Opal in BRISSKit Data computer • ALSPAC • MRC e-HIRCS • +more… Opal Finrisk R Data computer Opal Prevend Web services Web services BioSHaRE Web services web site R Web services R Analysis Computer Data computer Opal 1958BC R
    23. 23. Opal 1958BC
    24. 24. Opal 1958BC BRISSKit gains • DataSHIELD • DataSHaPER • Researcher ID
    25. 25. Opal gains • Direct interface with more tools • I2B2 functionality •Potential for enhanced user interface Opal 1958BC BRISSKit gains • DataSHIELD • DataSHaPER • Researcher ID
    26. 26. Opal gains • Direct interface with more tools • I2B2 functionality •Potential for enhanced user interface Opal 1958BC BRISSKit gains • DataSHIELD • DataSHaPER • Researcher ID Everybody gains • Enhanced combined functionality - better science • Bigger user group - greater portability • Greater potential to become a sustainable standard
    27. 27. Opal gains • Direct interface with more tools • I2B2 functionality •Potential for enhanced user interface Opal 1958BC BRISSKit gains • DataSHIELD • DataSHaPER • Researcher ID Everybody gains • Enhanced combined functionality - better science • Bigger user group - greater portability • Greater potential to become a sustainable standard Enhanced joint analysis with • Ethico-legal constraints e.g.US/Europe biobanks • Intellectual property issues e.g. H3AFRICA
    28. 28. The bottom line • Scientific harmonization • Restriction on access to individual level data • Streamlined access to multiple data sets  Central to the integrative aims of P3G, PHOEBE, BioSHaRE-eu etc  Also fundamental to the aims of potential BRISSKit users 29  Effective data access is crucial  Effective joint analysis is essential too (integration)  Fundamental challenges

    ×