Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Clinical Trial Data Wants
to be Free
Barry Smith
Publish all drug trial results, says Dr
Ben Goldacre, 19 June 2013:

http://www.bbc.co.uk/news/uk-politics-22957195
John Holdren, Director of the Office of Science
and Technology Policy, “has directed Federal
agencies with more than $100M...
Increasingly, these data will be
digitalized
Can we take care of the problems here,
at least prospectively?
Paper is giving way to digitalized data
• As more studies come on-line, the problems involved
in making them available to ...
Ought implies can
Ought implies can
Complete clinical trial data can be
made freely available, in deidentified form
https://immport.niaid.nih.gov/
Complete, deidentified data for 89 trials

10
DAIT-funded Projects Depositing Data Into
ImmPort

• Immune Tolerance Network (ITN)
• Atopic Dermatitis Research Network (...
ImmPort Team

Northrop Grumman Information
Technology Health Solutions
Stanford University
Atul Butte (PI)
Mark Davis (co-...
Why do we want the data to be free?
• Education
• Replication of results
• Scientific scrutiny / economy
• Secondary use
•...
Tutorial and Workshop
Ontology and Imaging
Informatics
• Would the training of pathologists (or other
professionals) chang...
Cooperative Clinical Trials in
Pediatric Transplantation (CCTPT):
Four studies
PRELIMINARY
#
Arms

Participate
Centers

Le...
PRELIMINARY
Common follow-up time points for 4
studies
PRELIMINARY

5 time points: time 0, 3, 6, 12, and 24 months post-transplant
are...
ITN Data (with thanks to Ravi Shankar)
Flow
Cytometry
data
(yellow)

PCR data
(green)

Study Protocol,
Operational data,
C...
What is in a visit name?
Visit 0, v0, v 0, 0, Day 0, Transplant

Protocol
Group
Schedule
of Events

Assay
Group
0

Specime...
What is in a visit name?
Visit 0, v0, v 0, 0, Day 0, Transplant
CRO
CRF

Protocol
Group
Schedule
of Events

Assay
Group
0
...
What is in a visit name?
Visit 0, v0, v 0, 0, Day 0, Transplant
CRO
CRF

Protocol
Group
Schedule
of Events

Assay
Group
0
...
What is in a visit name?
Visit 0, v0, v 0, 0, Day 0, Transplant
CRO

Day 0, Transplant

CRF

Protocol
Group
Schedule
of Ev...
What is in a visit name?
Visit 0, v0, v 0, 0, Day 0, Transplant
CRO

Day 0, Transplant
Core
Labs

CRF

v0

Assays

Protoco...
What is in a visit name?
Visit 0, v0, v 0, 0, Day 0, Transplant
CRO

Day 0, Transplant

Core
Labs

CRF

v0

Assays

Protoc...
What is in a visit name?
Visit 0, v0, v 0, 0, Day 0, Transplant
CRODay 0, Transplant
Core
Labs v0

CRF

Assays

Data
Cente...
Mappings between protocol, lab tests and
mechanistic assays were Lab Tests ( Study Time collected)
missing
Allergy Score (...
How are these problems currently being solved?

Hard work
Problems with hard work:
•Does not scale
•Does not comport with ...
Goals of ImmPort
• Accelerate a more collaborative and coordinated
research environment
• Create an integrated database th...
pipeline

perform
study &
collect
data

analyze
data
(SAS …)

submit
data to
ImmPort

process & deidentify, data
in
ImmPor...
Pipeline

PIs, hospitals, biostatisticians

Northrop
Grumman

Max &
Mindy

33
Alternatives to the strategy of hard work

perform
study &
collect
data

analyze
data
(SAS …)

submit
data to
ImmPort

PIs...
What we have currently

PIs, hospitals, biostatisticians,
CROs …

Northrop
Grumman

Stanford
(Max &
Mindy)

Lots of free t...
Semantic Web strategy of post-coordination
via arms-length enhancement of data

PIs, hospitals, biostatisticians,
CROs …
L...
The problem with this post hoc strategy is that it
still requires the same amount of hard work

PIs, hospitals, biostatist...
A preferred but much more ambitious
strategy: Pre-coordination
Identify uniform
standards that can be
applied already here...
ImmPort data is already being tagged
For example
•where data is prepared to meet FDA requirements
•where data is published...
Ought implies can
Complete clinical trial data can be made
freely available, in de-identified form
But to be useful these ...
Two alternative strategies for standardization
• 1. via consensus-based ontologies adapted to the needs of
trialists
• 2. ...
Advantages of pre-coordination with
ontologies
•
•
•
•

Better quality of data for all Maxes and Mindies
Enhanced discover...
Immune-Related Ontologies (examples)
Protein Ontology (PRO)
Gene Ontology (GO)
Cell Ontology (CL)
Immune Epitope Ontology
...
The very same ontological framework will work not just
for BISC but also for the NIAID BRCs

44
An Example of his this will work:
The ImmPort Antibody Registry/Ontology
Experimental methods typically report antibody cl...
ImmPort Antibody Registry (Diehl, et al)

from BD Lyoplate Screening Panels Human Surface Markers
46
Semantic Query / Discoverability

Find all experiments in which IL2 mRNA levels were quantified
Infer that IL2 mRNA is ana...
Second strategy: coordination through FDA
(CDISC) standards
Currently, PIs may need to reformat twice, once for ImmPort,
o...
Problems with the FDA (CDISC) strategy
• They have not been developed to support computation across
biological data
• They...
BRIDG 3.2 Domain Analysis Model
Strategy
• Identify useful standards and build them into the
clinical trial management systems, laboratory
information man...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Data and Analysis Portal
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Data and Analysis Portal
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Data and Analysis Portal
Upcoming SlideShare
Loading in …5
×

Clinical trial data wants to be free: Lessons from the ImmPort Immunology Data and Analysis Portal

1,185 views

Published on

Presentation to the Clinical and Research Ethics Seminar, Clinical and Translational Science Center, Buffalo, January 21, 2014
https://immport.niaid.nih.gov/
http://youtu.be/booqxkpvJMg

Published in: Health & Medicine
  • Be the first to comment

  • Be the first to like this

Clinical trial data wants to be free: Lessons from the ImmPort Immunology Data and Analysis Portal

  1. 1. Clinical Trial Data Wants to be Free Barry Smith
  2. 2. Publish all drug trial results, says Dr Ben Goldacre, 19 June 2013: http://www.bbc.co.uk/news/uk-politics-22957195
  3. 3. John Holdren, Director of the Office of Science and Technology Policy, “has directed Federal agencies with more than $100M in R&D expenditures to develop plans to make the published results of federally funded research freely available to the public within one year of publication and requiring researchers to better account for and manage the digital data resulting from federally funded scientific research.”
  4. 4. Increasingly, these data will be digitalized
  5. 5. Can we take care of the problems here, at least prospectively?
  6. 6. Paper is giving way to digitalized data • As more studies come on-line, the problems involved in making them available to automated analysis will only get worse • What we need is prospective standardization of a useful sort – using standards that will make the life of trialists easier and also increase the value of the data they produce for secondary use
  7. 7. Ought implies can
  8. 8. Ought implies can Complete clinical trial data can be made freely available, in deidentified form
  9. 9. https://immport.niaid.nih.gov/
  10. 10. Complete, deidentified data for 89 trials 10
  11. 11. DAIT-funded Projects Depositing Data Into ImmPort • Immune Tolerance Network (ITN) • Atopic Dermatitis Research Network (ADRN) • Population Genetics Analysis Program • HLA Region Genetics in Immune-Mediated Diseases • Clinical Trials in Organ Transplantation in Children (CTOT-C) • Consortium of Food Allergy Research (CoFAR) • Renal and Lung Living Donors Evaluation Study (RELIVE) • The Inner City Asthma Consortium (ICAC)
  12. 12. ImmPort Team Northrop Grumman Information Technology Health Solutions Stanford University Atul Butte (PI) Mark Davis (co-PI) Garry Nolan (co-PI) Ravi Shankar University of Buffalo Barry Smith (co-PI) Alex Diehl Alan Ruttenberg Technion Israel Institute of Technology Shai Shen-Orr
  13. 13. Why do we want the data to be free? • Education • Replication of results • Scientific scrutiny / economy • Secondary use • New biomedical discovery, including DIY science • New –omics/ Big Data start-ups • Reanalysis of original results o Linking existing trial data to new bioinformatics discoveries o Harvesting existing trial data by creating new, virtual metatrials
  14. 14. Tutorial and Workshop Ontology and Imaging Informatics • Would the training of pathologists (or other professionals) change if hundreds/thousands of trial-labeled images were publicly available?
  15. 15. Cooperative Clinical Trials in Pediatric Transplantation (CCTPT): Four studies PRELIMINARY # Arms Participate Centers Length of follow up (years) Transplant years CN01 1 4 3 2001-03 SW01 2 19 3 2001-04 SNS01 2 12 3 2004-06 PC01 1 4 2 2005-07 PC01 SW01 CN01 2001 2002 SNS01 2003 2004 2005 2006 2007
  16. 16. PRELIMINARY
  17. 17. Common follow-up time points for 4 studies PRELIMINARY 5 time points: time 0, 3, 6, 12, and 24 months post-transplant are in common
  18. 18. ITN Data (with thanks to Ravi Shankar) Flow Cytometry data (yellow) PCR data (green) Study Protocol, Operational data, Clinical data (blue) HLA data (purple) Specimen Management Data (green) 21
  19. 19. What is in a visit name? Visit 0, v0, v 0, 0, Day 0, Transplant Protocol Group Schedule of Events Assay Group 0 Specimen Table 0
  20. 20. What is in a visit name? Visit 0, v0, v 0, 0, Day 0, Transplant CRO CRF Protocol Group Schedule of Events Assay Group 0 Specimen Table 0 Day 0, Transplant
  21. 21. What is in a visit name? Visit 0, v0, v 0, 0, Day 0, Transplant CRO CRF Protocol Group Schedule of Events Assay Group 0 0 Specimen Table Operations Group Tube Table v0 Day 0, Transplant
  22. 22. What is in a visit name? Visit 0, v0, v 0, 0, Day 0, Transplant CRO Day 0, Transplant CRF Protocol Group Schedule of Events Assay Group 0 0 Tube Manufacturer Specimen Table Operations Group Tube Table v0 Kit Report Cimarron v0 ImmunoTrak v0, Visit 0
  23. 23. What is in a visit name? Visit 0, v0, v 0, 0, Day 0, Transplant CRO Day 0, Transplant Core Labs CRF v0 Assays Protocol Group Schedule of Events Assay Group 0 0 Tube Manufacturer Specimen Table Operations Group Tube Table v0 Kit Report Cimarron v0 ImmunoTrak v0, Visit 0
  24. 24. What is in a visit name? Visit 0, v0, v 0, 0, Day 0, Transplant CRO Day 0, Transplant Core Labs CRF v0 Assays Protocol Group Schedule of Events Assay Group 0 0 Tube Manufacturer Specimen Table Operations Group Tube Table Data Center v0 Database Kit Report Cimarron v0 ImmunoTrak v0, Visit 0
  25. 25. What is in a visit name? Visit 0, v0, v 0, 0, Day 0, Transplant CRODay 0, Transplant Core Labs v0 CRF Assays Data Center Assay Group 0 Protocol 0 Group Specimen Table Tube Manufacturer v 0 Database Kit Report Schedule of Events Operations v0 Group Tube Table Cimarron v0, Visit 0 ImmunoTrak 28
  26. 26. Mappings between protocol, lab tests and mechanistic assays were Lab Tests ( Study Time collected) missing Allergy Score ( Study Collection Day) Microarray Data ( Only Visit ) Flow ( Collection_Study_day and Visit) 29
  27. 27. How are these problems currently being solved? Hard work Problems with hard work: •Does not scale •Does not comport with the vision underlying ImmPort – that we can transform clinical medicine from an art into an (information-driven) science, based on repeatable processes documented in advance
  28. 28. Goals of ImmPort • Accelerate a more collaborative and coordinated research environment • Create an integrated database that broadens the usefulness of scientific data • Advance the pace and quality of scientific discovery • Integrate relevant data sets from participating laboratories, public and government databases, and private data sources • Promote rapid availability of important findings • Provide analysis tools to advance immunological research
  29. 29. pipeline perform study & collect data analyze data (SAS …) submit data to ImmPort process & deidentify, data in ImmPort discover, aggregate, analyze, data in ImmPort 32
  30. 30. Pipeline PIs, hospitals, biostatisticians Northrop Grumman Max & Mindy 33
  31. 31. Alternatives to the strategy of hard work perform study & collect data analyze data (SAS …) submit data to ImmPort PIs, hospitals, biostatisticians process & deidentify, data in ImmPort Northrop Grumman discover, aggregate, analyze, data in ImmPort Stanford (Max & Mindy)
  32. 32. What we have currently PIs, hospitals, biostatisticians, CROs … Northrop Grumman Stanford (Max & Mindy) Lots of free text, local formats, local standards, local terminologies operating here 35
  33. 33. Semantic Web strategy of post-coordination via arms-length enhancement of data PIs, hospitals, biostatisticians, CROs … Lots of free text, local formats, local standards, local terminologies operating here Northrop Grumman Stanford (Max & Mindy) uniform standards and ontologies applied post hoc 36
  34. 34. The problem with this post hoc strategy is that it still requires the same amount of hard work PIs, hospitals, biostatisticians, CROs … Lots of free text, local formats, local standards, local terminologies operating here Northrop Grumman Stanford (Max & Mindy) uniform standards and ontologies applied post hoc 37
  35. 35. A preferred but much more ambitious strategy: Pre-coordination Identify uniform standards that can be applied already here PIs, hospitals, biostatisticians, CROs … Northrop Grumman Stanford (Max & Mindy) 38
  36. 36. ImmPort data is already being tagged For example •where data is prepared to meet FDA requirements •where data is published to meet NIH mandates for reusability •in the post-submission phase, where data is analyzed by third parties But this tagging is •partial •uncoordinated •uses ontologies and analysis tools of varying quality
  37. 37. Ought implies can Complete clinical trial data can be made freely available, in de-identified form But to be useful these need need to be discoverable and analyzable Which means: standardization
  38. 38. Two alternative strategies for standardization • 1. via consensus-based ontologies adapted to the needs of trialists • 2. via FDA (CDISC) standards
  39. 39. Advantages of pre-coordination with ontologies • • • • Better quality of data for all Maxes and Mindies Enhanced discoverability of data Cost-free submission of data to ImmPort Works even for those trials which have nothing to do with FDA • Allows incremental strategy • Leads to immediate integration with bioinformatics data sources
  40. 40. Immune-Related Ontologies (examples) Protein Ontology (PRO) Gene Ontology (GO) Cell Ontology (CL) Immune Epitope Ontology Beta Cell Genomics Ontology Infectious Disease Ontology Allergy Ontology Antibody Ontology CDISC2RDF CL+ (for CyTOF) Cytokine Ontology Immunology Ontology VDJ Ontology http://ncorwiki.buffalo.edu/index.php/Immunology_Ontologies 43
  41. 41. The very same ontological framework will work not just for BISC but also for the NIAID BRCs 44
  42. 42. An Example of his this will work: The ImmPort Antibody Registry/Ontology Experimental methods typically report antibody clones or target markers using non-standardized terminology: CD3e, CD3E, CD3ɛ, CD3 epsilon (protein names) HIT3e vs. UCHT1 (antibody clones for CD3e) 550367 vs. 300401 (catalog numbers for anti-CD3e antibody reagents) Even catalog numbers have a half-life as concerns the information they provide
  43. 43. ImmPort Antibody Registry (Diehl, et al) from BD Lyoplate Screening Panels Human Surface Markers 46
  44. 44. Semantic Query / Discoverability Find all experiments in which IL2 mRNA levels were quantified Infer that IL2 mRNA is analyte and SAGE, QPCR and microarrays are appropriate measurement techniques Find all experiment samples that include samples from subjects with diseases like Type 1 diabetes Infers that the source of the biological sample used must be a human subject with Type 1 diabetes mellitus, Grave’s disease or other autoimmune diseases of endocrine glands
  45. 45. Second strategy: coordination through FDA (CDISC) standards Currently, PIs may need to reformat twice, once for ImmPort, once for FDA Coordination via ontologies would require mappings from these ontologies to CDISC standards But there is an alternative strategy: have all trialists use CDISC standards •Map the CDISC standards to common ontologies
  46. 46. Problems with the FDA (CDISC) strategy • They have not been developed to support computation across biological data • They are very slow to evolve (> 14 years so far) • They are designed to meet the needs of data managers rather than bioinformaticians • They lack compositionality (hard to integrate with other data • They are very complicated and so typically not in fact used by trialists; rather they are generated by software (for example Medidata) – with some loss in data quality (?) (through hard work?)
  47. 47. BRIDG 3.2 Domain Analysis Model
  48. 48. Strategy • Identify useful standards and build them into the clinical trial management systems, laboratory information management systems, such as LabKey that the PIs will be using in any case? • Join with Ravi Shankar and with the PHUSE (EU, Roche, AstraZeneca, FDA, …) project to incorporate ontology technology into CDISC

×