Demo of end-to-end federation of human studies design data using semantic web approaches and the Ontology of Clinical Research as the reference semantics.
Separation of Lanthanides/ Lanthanides and Actinides
Human Studies Database Project (demo)
1. Human Studies Database Project CTSA Informatics All Hands Meeting October 13, 2011 Ida Sim, UCSF, for the HSDB team Funding: CTSAs and R01-RR-026040
2.
3.
4.
5. Go for the Gold? Main Results Table 2.7 (1.1 - 4.1) 2.2 (1.7-3.4) 121 (99-129) 110 (87-134) 0.91 (0.93-1.04) 0.83 (0.79-0.99) 45.1 (39.9-50.5) 46.4 (39.2-51.2)
6.
7.
8.
9.
10.
11.
12. 6 HIV and 186 Studies Studies Study design Intervention/Factor Primary outcome(s) Secondary outcome(s) Taha (#1, UCSF* ) Parallel group, randomized Arm 1: metronidazole + erythomycin Arm 2: placebo - Infant HIV Infection at 4-6 wks - Composite of infant HIV infection and mortality, at 1 year of age - Infant HIV Infection at 24-48 hours, and 12 months - etc. Metzger (#2, UCSF* ) Parallel group, randomized Arm 1: Buprenorphine/ Naloxone 3 wks + 52 wks Arm 2: Buprenorphine/ Naloxone max 18 days All arms: counseling HIV-1 Infection or death, at 104 week visit - Death, through week 156 - HIV-1 Infection every 6 months at scheduled follow up visits - etc. German (#3, Hopkins) Cohort HIV status at baseline Recognized HIV Infection, Wave 2 (3 years) - Unrecognized HIV Infection, Wave 2 (3 years) - etc. Wawer (#4, Hopkins) Arm 1: Immediate circumcision Arm 2: Delayed circumcision Male-to-female HIV transmission, throughout study El-Sadr (#5, UCSF* ) Cohort Assigned to drug conservation (DC) arm or assigned to viral suppression (VS) arm in SMART study HIV transmission risk behavior, end of study HIV transmission risk behavior in participants who are not on ART at enrollment, end of study Cohen (#6, UCSF* ) Cohort HIV-1 infection status: proven acute, established, or uninfected - Prevalence of acute HIV infection, throughout study - etc. Rockefeller: 186 studies Interventional Observational
13. Data Sources Local Servers Query Integrator Registry XML XML auto-generation XML Manual Bulk Upload Protocol Documents Electronic IRB (iMedRIS) Johns Hopkins Rockefeller UCSF (AWS) OCRe-XSD OCRe
14.
15.
16.
17. Data Sources Local Servers Query Integrator Registry XML XML auto-generation XML Manual Bulk Upload Protocol Documents Electronic IRB (iMedRIS) Johns Hopkins Rockefeller UCSF (AWS) OCRe-XSD OCRe
18.
19.
20.
21.
22.
23. Data Sources Local Servers Query Integrator Registry XML XML auto-generation XML Manual Bulk Upload Protocol Documents Electronic IRB (iMedRIS) Johns Hopkins Rockefeller UCSF (AWS) OCRe-XSD OCRe
34. RU HSDB Data Mapping Workflow XML Generator: 1. Links IORG number 2. Cleans textual data 3. Generates xml file Oracle DB RU iMedRIS HSDB xsd schema SQL Mapper RU HSDB xml XML Generator generates rules for mapping data elements extracts data elements SQL Mapper: 1. Maps data elements using xsd 2. Transforms extracted data using analytics (data conversion, masking, concatenation, etc.) generates data elements table
35.
36.
37.
38.
39.
40. BioPortal REST services (SNOMED) UCSF HSDB Data OCRe in OWL Remote Services vSPARQL Service DXQuery Service Other Services XML RDF/OWL Other Clients QI Client RDF Store Query Database QI Server QES QI Core QI “Plugins”
55. Finding all study designs matching OCRe ID for “parallel group” Retrieve all placebo-controlled randomized trials Finding allocation schemes under OCRe’s “random allocation” hierarchy Finding interventions = SNOMED code for “placebo” [182886004]
58. Same BioPortal SNOMED subsumption call as for Macrolide query Matches any outcome variable code to SNOMED ID for HIV Infection or children All studies with HIV Infection as any single variable outcome
59. Call query for HIV infection as a single variable outcome with outcome priority = Primary All studies with HIV infection as a Primary single variable outcome
60. Primary outcome is HIV Infection at 4-6 weeks HIV Infection at 24-48 hours, and at 12 months, are Secondary outcomes
61.
62.
63.
64.
65.
66.
Editor's Notes
Our project objective is
The gold is results data, so start sharing there?
Standardized metadata isn’t enough. We need to know more about the study protocol. Were these outcomes…Probably baseline, but do you KNOW?
But still you don’t know enough. What do the columns represent? RCT of garlic vs. chocolate on weight loss? An observational study of garlic vs. chocolate on slowing renal failure?
To make sense of Context around Results Table … not study execution or research administration
These studies are in XML, conformant to an XSD that is automatically generated from OCRe. The XML data is in 3 seoarate servers,and we are going to query over them using the Query Integrator from the U. Washington.
The Quick Pass Demo: Xquery --
These studies are in XML, conformant to an XSD that is automatically generated from OCRe. The XML data is in 3 seoarate servers,and we are going to query over them using the Query Integrator from the U. Washington.
in original studies, observations are acquired directly on or from the study participants (including individual participant-level data from databases, and participant-level meta-analysis) in meta studies, observations are acquired from journal articles, abstracts, etc. reporting on other studies
To give you a sense of how OCRe is set up: (the generic set of possible annotations are defined in 'export_annotation_def.owl)
Protocols have variables (outcome variables, factor or treatment assignment variables). For this demo, we’ll be looking at HIV Infection as a study outcome variable. They may be described by a Clinical Descriptor, like an ID from a code system Outcome variables may be primary or secondary, and may be assessed at one or more timepoints. Each variable plays a role of an independent of dependent variable in one or more statistical analyses. For example, in the main analysis of an interventional study, has as its independent variable the assignment variable (to a marcolide or placebo), and the dependent variable is the primary outcome.
Variables may also be derived from other variables, e.g., composite outcomes, averages, etc.
These studies are in XML, conformant to an XSD that is automatically generated from OCRe. The XML data is in 3 seoarate servers,and we are going to query over them using the Query Integrator from the U. Washington.
We talked already about the OCRe Import graph. For data acquisition, we’ve decided that the normative form of HSDB data will be RDF. But it is a big jump from relational databases to RDF for most institutions. Therefore, we’re defining an intermediate step, OCRe in XSD, as a target for institutions to take their data from relational to XML. Then need to go from XML to RDF, not quite sure how yet. Once data in RDF, we can do logical curation and use various tools including WebProtege and/or VITRO to build curation interfaces. The RDF data can then be queried using Query Integrator, which accesses BioPortal for terminologies and value sets.
For
Interactive Matrix Language http://www.si.washington.edu/projects/QI http ://sig.biostr.washington. edu/projects/queryintegrator
manually supply macrolide SNOMED code for now queries BioPortal REST service for all children of macrolide term query 221: in turn REST query, get paths to leaves (passes in root), calls BioPortal, returns paths from Macrolide to leaf subclasses; and parses the results and pulls out the all leaf and interim SNOMED codes;