Your SlideShare is downloading. ×
Human Studies Database Project (demo)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Human Studies Database Project (demo)

677
views

Published on

Demo of end-to-end federation of human studies design data using semantic web approaches and the Ontology of Clinical Research as the reference semantics.

Demo of end-to-end federation of human studies design data using semantic web approaches and the Ontology of Clinical Research as the reference semantics.

Published in: Education, Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
677
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Our project objective is
  • The gold is results data, so start sharing there?
  • Standardized metadata isn’t enough. We need to know more about the study protocol. Were these outcomes…Probably baseline, but do you KNOW?
  • But still you don’t know enough. What do the columns represent? RCT of garlic vs. chocolate on weight loss? An observational study of garlic vs. chocolate on slowing renal failure?
  • To make sense of Context around Results Table … not study execution or research administration
  • These studies are in XML, conformant to an XSD that is automatically generated from OCRe. The XML data is in 3 seoarate servers,and we are going to query over them using the Query Integrator from the U. Washington.
  • The Quick Pass Demo: Xquery --
  • These studies are in XML, conformant to an XSD that is automatically generated from OCRe. The XML data is in 3 seoarate servers,and we are going to query over them using the Query Integrator from the U. Washington.
  • in original studies, observations are acquired directly on or from the study participants (including individual participant-level data from databases, and participant-level meta-analysis) in meta studies, observations are acquired from journal articles, abstracts, etc. reporting on other studies
  • To give you a sense of how OCRe is set up: (the generic set of possible annotations are defined in 'export_annotation_def.owl)
  • Protocols have variables (outcome variables, factor or treatment assignment variables). For this demo, we’ll be looking at HIV Infection as a study outcome variable. They may be described by a Clinical Descriptor, like an ID from a code system Outcome variables may be primary or secondary, and may be assessed at one or more timepoints. Each variable plays a role of an independent of dependent variable in one or more statistical analyses. For example, in the main analysis of an interventional study, has as its independent variable the assignment variable (to a marcolide or placebo), and the dependent variable is the primary outcome.
  • Variables may also be derived from other variables, e.g., composite outcomes, averages, etc.
  • These studies are in XML, conformant to an XSD that is automatically generated from OCRe. The XML data is in 3 seoarate servers,and we are going to query over them using the Query Integrator from the U. Washington.
  • We talked already about the OCRe Import graph. For data acquisition, we’ve decided that the normative form of HSDB data will be RDF. But it is a big jump from relational databases to RDF for most institutions. Therefore, we’re defining an intermediate step, OCRe in XSD, as a target for institutions to take their data from relational to XML. Then need to go from XML to RDF, not quite sure how yet. Once data in RDF, we can do logical curation and use various tools including WebProtege and/or VITRO to build curation interfaces. The RDF data can then be queried using Query Integrator, which accesses BioPortal for terminologies and value sets.
  • For
  • Interactive Matrix Language http://www.si.washington.edu/projects/QI http ://sig.biostr.washington. edu/projects/queryintegrator
  • manually supply macrolide SNOMED code for now queries BioPortal REST service for all children of macrolide term query 221: in turn REST query, get paths to leaves (passes in root), calls BioPortal, returns paths from Macrolide to leaf subclasses; and parses the results and pulls out the all leaf and interim SNOMED codes;
  • Transcript

    • 1. Human Studies Database Project CTSA Informatics All Hands Meeting October 13, 2011 Ida Sim, UCSF, for the HSDB team Funding: CTSAs and R01-RR-026040
    • 2.
      • And: Rob Wynden (UCSF), Davera Gabriel (UCDavis), Herb Hagler (UTSW), Meredith Nahm (Duke), Swati Chakraborty (Duke), Jahangheer Shaik (WUSL), Aniket Bhandare (WUSL), Richard Scheuermann (UTSW), Alan Rector (U Manchester)
      The HSDB Team Jim Brinkley U Wash Simona Carini UCSF Todd Detwiler U Wash Harold Lehmann Hopkins Brad Pollock UTHSC S Ant Shamim Mollah Rockefeller Ida Sim UCSF Harold Solbrig Mayo Samson Tu Stanford Knut Wittkowski Rockefeller BERD BERD
    • 3. Outline
      • HSDB Overview
      • Quick Pass Demo
      • Under the Hood
      • Deeper Dive Demo and Discussion
      • Summary
    • 4. Broad Long-Term Objective
      • Human studies a most valuable source of evidence
      • Goal is a federated, CTSA-wide database of past and ongoing human studies
        • interventional and observational
      • To enable large-scale computational reuse of human studies data for clinical and translational research
        • data mining
        • systematic review
        • planning future studies
    • 5. Go for the Gold? Main Results Table 2.7 (1.1 - 4.1) 2.2 (1.7-3.4) 121 (99-129) 110 (87-134) 0.91 (0.93-1.04) 0.83 (0.79-0.99) 45.1 (39.9-50.5) 46.4 (39.2-51.2)
    • 6. Need Standardized Metadata
      • e.g., SNOMED code for serum ionized calcium: 391084000
      2.7 (1.1 - 4.1) 2.2 (1.7-3.4) Creatinine 121 (99-129) 110 (87-134) Weight (kg) 0.91 (0.93-1.04) 0.83 (0.79-0.99) ICa 45.1 (39.9-50.5) 46.4 (39.2-51.2) Age
    • 7. Description of Study Protocol Critical for Interpreting Results
      • Baseline, primary, or secondary outcomes?
      2.7 (1.1 - 4.1) 2.2 (1.7-3.4) Creatinine 121 (99-129) 110 (87-134) Weight (kg) 0.91 (0.93-1.04) 0.83 (0.79-0.99) ICa 45.1 (39.9-50.5) 46.4 (39.2-51.2) Age
    • 8. Description of Study Protocol Critical for Interpreting Results
      • Baseline, primary, or secondary outcomes?
      • What do the columns represent?
      2.7 (1.1 - 4.1) 2.2 (1.7-3.4) Creatinine Chocolate Garlic 121 (99-129) 110 (87-134) Weight (kg) 0.91 (0.93-1.04) 0.83 (0.79-0.99) ICa 45.1 (39.9-50.5) 46.4 (39.2-51.2) Age
    • 9. Need Ontology of Clinical Research
      • Large scale computation of human studies data requires an ontology of study protocol metadata
        • for interpretive context around the results tables
      2.7 (1.1 - 4.1) 2.2 (1.7-3.4) Creatinine Chocolate Garlic 121 (99-129) 110 (87-134) Weight (kg) 0.91 (0.93-1.04) 0.83 (0.79-0.99) ICa 45.1 (39.9-50.5) 46.4 (39.2-51.2) Age
    • 10. HSDB Project Aims and Status
      • Define the Ontology of Clinical Research (OCRe)
        • modeled study design typology, interventions, outcomes, analyses, basic administrative data
      • Define the data sharing architecture using OCRe as the reference semantics
        • using semantic web technology (after several detours)
      • Pilot human studies data sharing from multiple CTSAs
        • demo of federated data queries over study designs
          • sample studies at Rockefeller (n=186), Hopkins (n=2), and UCSF (n=4)
        • sharing of summary-level results is Phase II
    • 11.
      • Quick Pass Demo
    • 12. 6 HIV and 186 Studies Studies Study design Intervention/Factor Primary outcome(s) Secondary outcome(s) Taha (#1, UCSF* ) Parallel group, randomized Arm 1: metronidazole + erythomycin Arm 2: placebo - Infant HIV Infection at 4-6 wks - Composite of infant HIV infection and mortality, at 1 year of age - Infant HIV Infection at 24-48 hours, and 12 months - etc. Metzger (#2, UCSF* ) Parallel group, randomized Arm 1: Buprenorphine/ Naloxone 3 wks + 52 wks Arm 2: Buprenorphine/ Naloxone max 18 days All arms: counseling HIV-1 Infection or death, at 104 week visit - Death, through week 156 - HIV-1 Infection every 6 months at scheduled follow up visits - etc. German (#3, Hopkins) Cohort HIV status at baseline Recognized HIV Infection, Wave 2 (3 years) - Unrecognized HIV Infection, Wave 2 (3 years) - etc. Wawer (#4, Hopkins) Arm 1: Immediate circumcision Arm 2: Delayed circumcision Male-to-female HIV transmission, throughout study El-Sadr (#5, UCSF* ) Cohort Assigned to drug conservation (DC) arm or assigned to viral suppression (VS) arm in SMART study HIV transmission risk behavior, end of study HIV transmission risk behavior in participants who are not on ART at enrollment, end of study Cohen (#6, UCSF* ) Cohort HIV-1 infection status: proven acute, established, or uninfected - Prevalence of acute HIV infection, throughout study - etc. Rockefeller: 186 studies Interventional Observational
    • 13. Data Sources Local Servers Query Integrator Registry XML XML auto-generation XML Manual Bulk Upload Protocol Documents Electronic IRB (iMedRIS) Johns Hopkins Rockefeller UCSF (AWS) OCRe-XSD OCRe
    • 14.
      • Retrieve all studies where a macrolide antibiotic was administered
      Calls BioPortal with a subsumption query on SNOMED, for children of “macrolide” [428787002] Searches for interventions within arms, for codes matching “macrolide” or children
    • 15.
      • Phase III Trial of Antibiotics to Reduce Chorioamnionitis-Related Perinatal HIV Transmission
    • 16. Outline
      • HSDB Overview
      • Quick Pass Demo
      • Under the Hood
        • OCRe
        • Data Sharing Architecture
        • Data Acquisition
        • Federated Query
      • Deeper Dive Demo and Discussion
      • Summary
    • 17. Data Sources Local Servers Query Integrator Registry XML XML auto-generation XML Manual Bulk Upload Protocol Documents Electronic IRB (iMedRIS) Johns Hopkins Rockefeller UCSF (AWS) OCRe-XSD OCRe
    • 18. OCRe
      • OWL 2.0 project at HSDBwiki.org, NCBO BioPortal
      • Models human studies for scientific query and analysis
      • Domain
        • all studies in which humans, parts of humans, or groups of humans are enrolled, exposed, or observed
      • Scope
        • all clinical domains, all variable types (quantitative, qualitative, imaging, genomics, etc.)
      Sim I, et al. AMIA CRI Summit 2010, p.51-55.
    • 19. OCRe Import Graph
      • OCRe: core OWL ontology containing all primitive concepts and relationships and selected defined classes
      • OCRe_ext: extended with application/project specific defined classes
      • HSDB_OCRe: includes HSDB information model annotations (for autogenerating XSD)
    • 20. Modeling Study Outcomes and Analyses
      • 4-6 weeks
      HIV Infection has_code 86406008 has_code_system_name SNOMED-CT has_code_system_version2011_01_31 has_display_name Human immunodeficiency virus infection Primary
    • 21. Composite Outcomes: HIV Infection or Death at 1 year of age
      • Need an expression grammar (HIV infection OR Death)
      Var1: HIV Infection Var2: Death 1 year of age Primary
    • 22. Outline
      • HSDB Overview
      • Quick Pass Demo
      • Under the Hood
        • OCRe
        • Data Sharing Architecture
        • Data Acquisition
        • Federated Query
      • Deeper Dive Demo and Discussion
      • Summary
    • 23. Data Sources Local Servers Query Integrator Registry XML XML auto-generation XML Manual Bulk Upload Protocol Documents Electronic IRB (iMedRIS) Johns Hopkins Rockefeller UCSF (AWS) OCRe-XSD OCRe
    • 24. The HSDB 4-Quadrant Diagram
    • 25.
      • CShare, BRIDG, OCRe in UML, OpenMDR, CDEs, LexEVS, BioPortal caGrid, SHRINE, Dynamic Extensions, HOM, i2b2, etc.
      October, 2010
    • 26.
      • Dropped OCRe in UML, CShare, caGrid
      April, 2011
    • 27.
      • Dropped openMDR, CDEs, SHRINE, i2b2, HOM
      • Moved to fully semantic web approach
      May, 2011
    • 28.
      • Dropped Dynamic Extensions, brought in VIVO/VITRO
      June, 2011
    • 29. October, 2010 October, 2011
    • 30.
      • For demo: staying in XML (not RDF), no logical curation, no data curation interface
    • 31. OCRe-XSD
      • Automatically generated from HSDB_OCRe
        • guided by annotations
      • Elements are indexed to OCRe IDs via purl URIs
    • 32. Outline
      • HSDB Overview
      • Quick Pass Demo
      • Under the Hood
        • OCRe
        • Data Sharing Architecture
        • Data Acquisition
        • Federated Query
      • Deeper Dive Demo and Discussion
      • Summary
    • 33.  
    • 34. RU HSDB Data Mapping Workflow XML Generator: 1. Links IORG number 2. Cleans textual data 3. Generates xml file Oracle DB RU iMedRIS HSDB xsd schema SQL Mapper RU HSDB xml XML Generator generates rules for mapping data elements extracts data elements SQL Mapper: 1. Maps data elements using xsd 2. Transforms extracted data using analytics (data conversion, masking, concatenation, etc.) generates data elements table
    • 35. Hopkins and UCSF Workflow
      • Manual selection of HIV-related protocols
        • from local IRB, ClinicalTrials.gov
      • Manual instantiation into XSD-conformant XML
        • using Oxygen XML editor
    • 36. Registry File of Published XML Instances
      • http://purl.org/sig/reg/hsdb/hsdb_demo_registry.xml
      • Rockefeller, 186 studies, at Rockefeller server
      • UCSF, 4 studies, a t Amazon Web Services
      • Hopkins, 2 studies, at Hopkins server
    • 37. Outline
      • HSDB Overview
      • Quick Pass Demo
      • Under the Hood
        • OCRe
        • Data Sharing Architecture
        • Data Acquisition
        • Federated Query
      • Deeper Dive Demo and Discussion
      • Summary
    • 38.  
    • 39. Query Integrator: Brinkley Lab, U Wash
      • Write, save, reuse, and chain queries over any web accessible XML or RDF source
      • SPARQL, XQuery, IML, etc.
      http://www.si.washington.edu/projects/QI
    • 40. BioPortal REST services (SNOMED) UCSF HSDB Data OCRe in OWL Remote Services vSPARQL Service DXQuery Service Other Services XML RDF/OWL Other Clients QI Client RDF Store Query Database QI Server QES QI Core QI “Plugins”
    • 41. Outline
      • HSDB Overview
      • Quick Pass Demo
      • Under the Hood
        • OCRe
        • Data Sharing Architecture
        • Data Acquisition
        • Federated Query
      • Deeper Dive Demo and Discussion
      • Summary
    • 42. Four Illustrative Queries
      • Interventions
        • all studies administering macrolide
      • Study Design
        • all interventional studies
        • all placebo-controlled randomized studies
      • Outcomes
        • all studies with primary outcome of HIV Infection
    • 43. Demo: Query on Interventions
      • Interventions Query 1: all studies administering a macrolide
        • demonstrates query exploiting SNOMED’s semantic hierarchies
        • demonstrates modeling of arm structure in interventional studies
    • 44. Querying BioPortal
    • 45.
      • Retrieve all studies where a macrolide antibiotic was administered
      Chains a subsumption query to SNOMED, macrolide ID = 428787002
    • 46.
      • BioPortal SNOMED subclass query
      REST call to SNOMED in BioPortal Cleans and returns all subclasses of SNOMED ID (e.g., 428787002 for Macrolide)
    • 47. SNOMED Results: All Children of 42878700
    • 48.
      • Retrieve all studies where a macrolide antibiotic was administered
      Matches studies where Arm tags contain SNOMED code of Macrolide or its children
    • 49. Arm Structure Explicitly Modeled
    • 50. Query on Study Design
      • Design Query 1: all interventional studies
        • demonstrates use of OCRe’s study design typology
          • returns “parallel group” studies
    • 51. Finding all study designs matching OCRe ID for “interventional” or its children Retrieve all interventional studies
    • 52. OCRe study design typology OCRe_XSD XML instance
    • 53.  
    • 54. Query on Study Design
      • Design Query 1: all interventional studies
        • demonstrates use of OCRe’s study design typology
          • returns “parallel group” studies
      • Design Query 2: placebo-controlled RCTs
        • demonstrates explicit modeling
          • Intervention = placebo (re-using “macrolide” query)
          • StudyDesign = parallel group
          • AllocationType = Random Allocation, or OCRe child of
    • 55. Finding all study designs matching OCRe ID for “parallel group” Retrieve all placebo-controlled randomized trials Finding allocation schemes under OCRe’s “random allocation” hierarchy Finding interventions = SNOMED code for “placebo” [182886004]
    • 56. OCRe Hierarchy of Allocation Type
    • 57. Query on Outcome Variables
      • Outcome Query 1: Any study for which HIV infection is a Primary outcome (single variable outcome)
        • demonstrates use of SNOMED hierarchy for “HIV infection”
        • illustrates timepoint-specific primary and secondary outcomes
    • 58. Same BioPortal SNOMED subsumption call as for Macrolide query Matches any outcome variable code to SNOMED ID for HIV Infection or children All studies with HIV Infection as any single variable outcome
    • 59. Call query for HIV infection as a single variable outcome with outcome priority = Primary All studies with HIV infection as a Primary single variable outcome
    • 60. Primary outcome is HIV Infection at 4-6 weeks HIV Infection at 24-48 hours, and at 12 months, are Secondary outcomes
    • 61.
      • Discussion
    • 62. Outline
      • HSDB Overview
      • Quick Pass Demo
      • Under the Hood
        • OCRe
        • Data Sharing Architecture
        • Data Acquisition
        • Federated Query
      • Deeper Dive Demo and Discussion
      • Summary
    • 63. Summary
      • Human studies most valuable source of evidence on therapies, etc., should be computable at large scale
      • Requires standardized semantics of study protocol features
      • HSDB Project has demonstrated
        • OCRe can be used to describe range of studies
        • can capture OCRe-standardized data from source systems via XSD schema
        • can federate data queries over XML instances, OCRe, and SNOMED (via live queries to BioPortal)
      • Data acquisition remains challenging
    • 64. Future Work
      • OCRe
        • expression grammar for composite outcomes
        • ERGO for eligibiity criteria, summary-level results
      • Data federation
        • converting XML instances to RDF
        • data curation user interface
        • friendlier query interface
      • Data acquisition
        • mapping and bulk uploads from local source systems
        • policy and staffing issues
    • 65. Federate Your Data With Us!
      • Bulk transformation of instances
        • map from your local schema to our XSD
        • generate XML instance file
      • Curate
        • when data acquisition/curation interface available, help test
        • curate data
      • Publish locally
        • may install local Query Integrator
    • 66. Links
      • Project Links
        • HSDB http://hsdbwiki.org/
        • OCRe http://co de.google.com/p/ontology-of-clinical-research/
        • Query Integrator http ://sig.biostr.washington.edu/projects/queryintegrator
      • Contacts
        • Overall: Ida Sim, [email_address]
        • OCRe: Samson Tu, [email_address]
        • Data federation:
          • Jim Brinkley, [email_address]
          • Todd Detweiler [email_address] n.edu