Presentation to the Clinical and Research Ethics Seminar, Clinical and Translational Science Center, Buffalo, January 21, 2014
https://immport.niaid.nih.gov/
http://youtu.be/booqxkpvJMg
2. Publish all drug trial results, says Dr
Ben Goldacre, 19 June 2013:
http://www.bbc.co.uk/news/uk-politics-22957195
3. John Holdren, Director of the Office of Science
and Technology Policy, “has directed Federal
agencies with more than $100M in R&D
expenditures to develop plans to make the
published results of federally funded research
freely available to the public within one year of
publication and requiring researchers to better
account for and manage the digital data
resulting from federally funded scientific
research.”
5. Can we take care of the problems here,
at least prospectively?
6. Paper is giving way to digitalized data
• As more studies come on-line, the problems involved
in making them available to automated analysis will
only get worse
• What we need is prospective standardization of a
useful sort – using standards that will make the life of
trialists easier and also increase the value of the data
they produce for secondary use
13. DAIT-funded Projects Depositing Data Into
ImmPort
• Immune Tolerance Network (ITN)
• Atopic Dermatitis Research Network (ADRN)
• Population Genetics Analysis Program
• HLA Region Genetics in Immune-Mediated Diseases
• Clinical Trials in Organ Transplantation in Children
(CTOT-C)
• Consortium of Food Allergy Research (CoFAR)
• Renal and Lung Living Donors Evaluation Study (RELIVE)
• The Inner City Asthma Consortium (ICAC)
14. ImmPort Team
Northrop Grumman Information
Technology Health Solutions
Stanford University
Atul Butte (PI)
Mark Davis (co-PI)
Garry Nolan (co-PI)
Ravi Shankar
University of Buffalo
Barry Smith (co-PI)
Alex Diehl
Alan Ruttenberg
Technion Israel Institute of Technology
Shai Shen-Orr
15. Why do we want the data to be free?
• Education
• Replication of results
• Scientific scrutiny / economy
• Secondary use
• New biomedical discovery, including DIY science
• New –omics/ Big Data start-ups
• Reanalysis of original results
o Linking existing trial data to new bioinformatics discoveries
o Harvesting existing trial data by creating new, virtual metatrials
16. Tutorial and Workshop
Ontology and Imaging
Informatics
• Would the training of pathologists (or other
professionals) change if hundreds/thousands of
trial-labeled images were publicly available?
19. Common follow-up time points for 4
studies
PRELIMINARY
5 time points: time 0, 3, 6, 12, and 24 months post-transplant
are in common
20. ITN Data (with thanks to Ravi Shankar)
Flow
Cytometry
data
(yellow)
PCR data
(green)
Study Protocol,
Operational data,
Clinical data
(blue)
HLA data
(purple)
Specimen
Management
Data
(green)
21
21. What is in a visit name?
Visit 0, v0, v 0, 0, Day 0, Transplant
Protocol
Group
Schedule
of Events
Assay
Group
0
Specimen
Table
0
22. What is in a visit name?
Visit 0, v0, v 0, 0, Day 0, Transplant
CRO
CRF
Protocol
Group
Schedule
of Events
Assay
Group
0
Specimen
Table
0
Day 0, Transplant
23. What is in a visit name?
Visit 0, v0, v 0, 0, Day 0, Transplant
CRO
CRF
Protocol
Group
Schedule
of Events
Assay
Group
0
0
Specimen
Table
Operations
Group
Tube
Table
v0
Day 0, Transplant
24. What is in a visit name?
Visit 0, v0, v 0, 0, Day 0, Transplant
CRO
Day 0, Transplant
CRF
Protocol
Group
Schedule
of Events
Assay
Group
0
0
Tube
Manufacturer
Specimen
Table
Operations
Group
Tube
Table
v0
Kit
Report
Cimarron
v0
ImmunoTrak
v0, Visit 0
25. What is in a visit name?
Visit 0, v0, v 0, 0, Day 0, Transplant
CRO
Day 0, Transplant
Core
Labs
CRF
v0
Assays
Protocol
Group
Schedule
of Events
Assay
Group
0
0
Tube
Manufacturer
Specimen
Table
Operations
Group
Tube
Table
v0
Kit
Report
Cimarron
v0
ImmunoTrak
v0, Visit 0
26. What is in a visit name?
Visit 0, v0, v 0, 0, Day 0, Transplant
CRO
Day 0, Transplant
Core
Labs
CRF
v0
Assays
Protocol
Group
Schedule
of Events
Assay
Group
0
0
Tube
Manufacturer
Specimen
Table
Operations
Group
Tube
Table
Data
Center
v0
Database
Kit
Report
Cimarron
v0
ImmunoTrak
v0, Visit 0
27. What is in a visit name?
Visit 0, v0, v 0, 0, Day 0, Transplant
CRODay 0, Transplant
Core
Labs v0
CRF
Assays
Data
Center
Assay
Group 0
Protocol
0
Group
Specimen
Table
Tube
Manufacturer v 0
Database
Kit
Report
Schedule
of Events
Operations
v0
Group
Tube
Table
Cimarron v0, Visit 0
ImmunoTrak
28
28. Mappings between protocol, lab tests and
mechanistic assays were Lab Tests ( Study Time collected)
missing
Allergy Score ( Study Collection Day)
Microarray Data ( Only Visit )
Flow ( Collection_Study_day and Visit)
29
29. How are these problems currently being solved?
Hard work
Problems with hard work:
•Does not scale
•Does not comport with the vision underlying
ImmPort – that we can transform clinical medicine
from an art into an (information-driven) science,
based on repeatable processes documented in
advance
30. Goals of ImmPort
• Accelerate a more collaborative and coordinated
research environment
• Create an integrated database that broadens the
usefulness of scientific data
• Advance the pace and quality of scientific discovery
• Integrate relevant data sets from participating
laboratories, public and government databases, and
private data sources
• Promote rapid availability of important findings
• Provide analysis tools to advance immunological
research
33. Alternatives to the strategy of hard work
perform
study &
collect
data
analyze
data
(SAS …)
submit
data to
ImmPort
PIs, hospitals, biostatisticians
process &
deidentify,
data in
ImmPort
Northrop
Grumman
discover,
aggregate,
analyze,
data in
ImmPort
Stanford
(Max &
Mindy)
34. What we have currently
PIs, hospitals, biostatisticians,
CROs …
Northrop
Grumman
Stanford
(Max &
Mindy)
Lots of free text, local formats,
local standards, local
terminologies operating here
35
35. Semantic Web strategy of post-coordination
via arms-length enhancement of data
PIs, hospitals, biostatisticians,
CROs …
Lots of free text, local formats,
local standards, local
terminologies operating here
Northrop
Grumman
Stanford
(Max &
Mindy)
uniform standards
and ontologies
applied post hoc
36
36. The problem with this post hoc strategy is that it
still requires the same amount of hard work
PIs, hospitals, biostatisticians,
CROs …
Lots of free text, local formats,
local standards, local
terminologies operating here
Northrop
Grumman
Stanford
(Max &
Mindy)
uniform standards
and ontologies
applied post hoc
37
37. A preferred but much more ambitious
strategy: Pre-coordination
Identify uniform
standards that can be
applied already here
PIs, hospitals, biostatisticians,
CROs …
Northrop
Grumman
Stanford
(Max &
Mindy)
38
38. ImmPort data is already being tagged
For example
•where data is prepared to meet FDA requirements
•where data is published to meet NIH mandates for
reusability
•in the post-submission phase, where data is
analyzed by third parties
But this tagging is
•partial
•uncoordinated
•uses ontologies and analysis tools of varying quality
39. Ought implies can
Complete clinical trial data can be made
freely available, in de-identified form
But to be useful these need need to be
discoverable and analyzable
Which means: standardization
40. Two alternative strategies for standardization
• 1. via consensus-based ontologies adapted to the needs of
trialists
• 2. via FDA (CDISC) standards
41. Advantages of pre-coordination with
ontologies
•
•
•
•
Better quality of data for all Maxes and Mindies
Enhanced discoverability of data
Cost-free submission of data to ImmPort
Works even for those trials which have nothing to do with
FDA
• Allows incremental strategy
• Leads to immediate integration with bioinformatics data
sources
43. The very same ontological framework will work not just
for BISC but also for the NIAID BRCs
44
44. An Example of his this will work:
The ImmPort Antibody Registry/Ontology
Experimental methods typically report antibody clones
or target markers using non-standardized terminology:
CD3e, CD3E, CD3ɛ, CD3 epsilon (protein names)
HIT3e vs. UCHT1 (antibody clones for CD3e)
550367 vs. 300401 (catalog numbers for anti-CD3e
antibody reagents)
Even catalog numbers have a half-life as concerns
the information they provide
46. Semantic Query / Discoverability
Find all experiments in which IL2 mRNA levels were quantified
Infer that IL2 mRNA is analyte and SAGE, QPCR and microarrays are appropriate
measurement techniques
Find all experiment samples that include samples from subjects with diseases like Type
1 diabetes
Infers that the source of the biological sample used must be a human subject with
Type 1 diabetes mellitus, Grave’s disease or other autoimmune diseases of
endocrine glands
47. Second strategy: coordination through FDA
(CDISC) standards
Currently, PIs may need to reformat twice, once for ImmPort,
once for FDA
Coordination via ontologies would require mappings from these
ontologies to CDISC standards
But there is an alternative strategy: have all trialists use CDISC
standards
•Map the CDISC standards to common ontologies
48. Problems with the FDA (CDISC) strategy
• They have not been developed to support computation across
biological data
• They are very slow to evolve (> 14 years so far)
• They are designed to meet the needs of data managers rather
than bioinformaticians
• They lack compositionality (hard to integrate with other data
• They are very complicated and so typically not in fact used by
trialists; rather they are generated by software (for example
Medidata) – with some loss in data quality (?) (through hard
work?)
51. Strategy
• Identify useful standards and build them into the
clinical trial management systems, laboratory
information management systems, such as LabKey
that the PIs will be using in any case?
• Join with Ravi Shankar and with the PHUSE (EU,
Roche, AstraZeneca, FDA, …) project to
incorporate ontology technology into CDISC