Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Health Sciences Research Informatics, Powered by Globus

63 views

Published on

This presentation was given at the 2019 GlobusWorld Conference in Chicago, IL by Jonathan Silverstein and Mike Davis, both from University of Pittsburgh.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Health Sciences Research Informatics, Powered by Globus

  1. 1. Enabling Cancer Research with End-to-end Infrastructure for the Request, Exploration, and Receipt of Cancer Registry Data https://www.dbmi.pitt.edu/ https://rio.pitt.edu Jonathan C. Silverstein, MD, MS, FACS, FACMI Chief Research Informatics Officer Health Sciences and Institute for Precision Medicine Professor Department of Biomedical Informatics (DBMI), University of Pittsburgh Michael Davis, MS Software Integration Architect Department of Biomedical Informatics (DBMI), University of Pittsburgh
  2. 2. Research Informatics Office (RIO.pitt.edu) • 25 staff in DBMI with mission “to support investigators through innovative collection and use of biomedical data” • Science-as-a-Service (scientists serving scientists) • Clinical Data Science Projects • Basic Data Science Projects • Translational Data Science Projects
  3. 3. R3 Health Record Research Request
  4. 4. Neptune/R3 (w Shyam Visweswaran)
  5. 5. HuBMAP Service Architecture (w Nick Nystrom)
  6. 6. CR3 Collaborators • University of Pittsburgh – Biomedical Informatics (Jonathan Silverstein) – Institute for Precision Medicine (Adrian Lee) • UPMC – Hillman Cancer Center (Robert Ferris) – Network Cancer Registry (Sharon Winters) • University of Chicago – Globus (Ian Foster) NCI U24 CA209996, PI: Ian T. Foster
  7. 7. CURRENT STATE: Step 1: Define desired data set Researcher talks to registrar about what kind of patient cohort they want. Registar queries the registry for the researchers. RegistrarResearcher Lengthy, iterative, labor-intensive process No mechanism for ad hoc exploration of the data by the researcher
  8. 8. CURRENT STATE: Step 2: Deliver data to researcher Registrar delivers data to researcher Registar extracts data from Registry RegistrarResearcher Registrar delivers data via email, OneDrive, Sharepoint, mapped drive No tracking/auditing, error prone, insecure
  9. 9. FUTURE STATE: Step 1: Define desired data set Goal 1: Provide mechanism for self service exploration of registry data Registar queries the registry for the researchers. Researchers queries Registry RegistrarResearcher CR3
  10. 10. FUTURE STATE: Step 2: Deliver data to researcher RegistrarResearcher Registrar delivers data to researcher Registar extracts data from Registry Goal 2: Provide secure, auditable, access controlled mechanism to share data
  11. 11. Research Portal Advanced Features • Modern technical approach (SaaS deployment) • Federated Identity (each institution does authentication) • Honest Brokerage with clinical data (local control) • NAACCR Standard data (each institution can generate) • Search via Globus (one search traverses institutions) • Easy GUI to develop and use (includes faceted search) • Easy search transmission (search “sentence” to registrar) • Secure Data Liquidity (easy to move data to authorized)
  12. 12. Cancer Registry Records for Research (CR3) Portal
  13. 13. Cancer Registry Records for Research (CR3) Portal Feature Summary • Portal code developed using Globus portal framework (Django/python) based on Modern Research Data Portal design pattern (PeerJ Articles:cs-144 https://peerj.com/articles/cs-144/) • Federated logon using Globus Auth (GA) with Pitt/UPMC as identity providers • Cancer registry data ingested into Globus Search (GS) • Aggregate charts (i.e., age range, gender) • Google-like text search with collapsible facet panels for filtering
  14. 14. CR3 Discovery Portal Cohort aggregate counts Login with UPMC/Pitt credentials Globus Search (GS) Globus Auth (GA) Elasticsearch Cancer Registry De-identified Data Index (minimal criteria data: e.g., staging) UPMC/Pitt Identity Providers Authentication Auth initiated to GA Cohort search initiated to GS Researcher Cohort aggregate counts returned Email Request sent to CR CR3 Architecture v1.0 Registry Staff (Globus Django Framework)
  15. 15. Summary of data stored in Globus Search Index • Focused on 7 primary sites (i.e., ICD-O-3 codes) • Breast (e.g., C500-C509), Colon, Lung, Skin, Head and Neck, Ovary, Prostate • 30 common data elements (CDE) (based on North American Association of Central Cancer Registries (NAACCR) standard and limited to Safe Harbor) • Limit to 2 UPMC hospitals (UPMC operates 40 academic, community, and specialty hospitals) • UPMC Presbyterian Shadyside • UPMC Magee-Womens Hospital • Years 2004 -2017 (based on the availability of linking other clinical data from our Neptune data warehouse in the future) • Included Completed and Reportable cases only (NAACCR standards) • Roughly 65,000 patients Globus Search Cancer Registry DB
  16. 16. Extraction Pipeline into Globus Search CSV CSV CSV Globus Search Transformation Process (python) Cancer Registry DB *Data extracts Transformation Process (python) Ingestion Process (python) GMetaEntry format (Safe Harbor Data) Cancer Registry (Honest Broker) Informatics (Honest Broker) * Future version will generate/accept the NAACCR XML format
  17. 17. { "@version": "2016-11-09", "ingest_data": { "@version": "2017-09-01", "gmeta": [ { "@version": "2017-09-01", "subject": "https://dbmi.pitt.edu/subject/201201295:00:BRCA", "visible_to": [ "urn:globus:groups:id:ddb13d88-d6df-11e8-9daa-0e368f3075e8" ], "content": { "clinical_t": "T2", "mets_at_dx": "No distant metastasis", "recurrence": "No", "vitals": "Alive", "clinical_n": "N0", "race": "White", "histology": "Infiltrating duct carcinoma, NOS", "path_stage": "2A", "facility": "Magee", "histology_code": "85003", "disease_category": "Breast", "dx_age_range": "35-44", "grade": "Grade II: Mod diff, mod well diff,", "spanish_hispanic_origin": "Non-Spanish; non-Hispanic", "year_last_contact": "2017", "cs_stage": "2A", "site": "Breast, NOS", "cs_extension": "100", "clinical_stage": "2A", "cause_of_death": "Not applicable", "path_n": "N0", "path_t": "T2", "cancer_status": "No evidence of this tumor", "age_at_dx": "36", "gender": "Female", "summary_stage": "Localized", "clinical_m": "M0", "laterality": "Left - origin of primary", "dx_year": "2012", "site_code": "C509" } } ] } } GMetaEntry data in GS visible_to: content: (scope limited to defined Globus Group)
  18. 18. Long Term Goals • Create a network of federated cancer registries – Deploy similar infrastructure at other cancer registries – Enable queries across multiple registries – Model proven in ITCR TIES/TCRN project with independent installation of TIES or connect a TIES installation to TCRN for multi-institutional queries • Federation via Globus allows network to scale while control remains with each institution – Data owners input and export data sets, apply quality control – Data owners control all access policies – No overhead from ingestion of disparate datasets – Registry data remains at the location where it was generated – Identities are provided and authenticated by the institution, not Globus – System can scale almost infinitely if data owners provide their own storage resources
  19. 19. CR3 Discovery Portal Cohort aggregate counts Login with UPMC/Pitt credentials Globus Search (GS) Globus Auth (GA) UPMC/Pitt Identity Providers Authentication Auth initiated to GA Cohort search initiated to GS Researcher Cohort aggregate counts returned CR3 Architecture v2.0 Globus Transfer (GT) Registry Staff Data transfer from registrar to researcher mediated by GT Manage authorization Elasticsearch Cancer Registry De-identified Data Index (minimal criteria data: e.g., staging) Request Service
  20. 20. Questions? j.c.s@pitt.edu midavis@pitt.edu

×