Health Sciences Research Informatics, Powered by Globus

Globus
Enabling Cancer Research with End-to-end
Infrastructure for the Request, Exploration, and
Receipt of Cancer Registry Data
https://www.dbmi.pitt.edu/ https://rio.pitt.edu
Jonathan C. Silverstein, MD, MS, FACS, FACMI
Chief Research Informatics Officer
Health Sciences and Institute for Precision Medicine
Professor
Department of Biomedical Informatics (DBMI), University of Pittsburgh
Michael Davis, MS
Software Integration Architect
Department of Biomedical Informatics (DBMI), University of Pittsburgh
Research Informatics Office (RIO.pitt.edu)
• 25 staff in DBMI with mission “to support investigators
through innovative collection and use of biomedical data”
• Science-as-a-Service (scientists serving scientists)
• Clinical Data Science Projects
• Basic Data Science Projects
• Translational Data Science Projects
R3 Health Record
Research Request
Neptune/R3 (w Shyam Visweswaran)
HuBMAP Service Architecture (w Nick Nystrom)
CR3 Collaborators
• University of Pittsburgh
– Biomedical Informatics (Jonathan Silverstein)
– Institute for Precision Medicine (Adrian Lee)
• UPMC
– Hillman Cancer Center (Robert Ferris)
– Network Cancer Registry (Sharon Winters)
• University of Chicago
– Globus (Ian Foster)
NCI U24 CA209996, PI: Ian T. Foster
CURRENT STATE: Step 1: Define desired data set
Researcher talks to registrar
about what kind of patient
cohort they want.
Registar queries
the registry for the
researchers.
RegistrarResearcher
Lengthy, iterative, labor-intensive process
No mechanism for ad hoc exploration of the data by the researcher
CURRENT STATE: Step 2: Deliver data to researcher
Registrar delivers data to
researcher
Registar extracts
data from Registry
RegistrarResearcher
Registrar delivers data via email, OneDrive, Sharepoint, mapped drive
No tracking/auditing, error prone, insecure
FUTURE STATE: Step 1: Define desired data set
Goal 1: Provide mechanism for self service exploration of registry data
Registar queries
the registry for the
researchers.
Researchers queries Registry
RegistrarResearcher
CR3
FUTURE STATE: Step 2: Deliver data to researcher
RegistrarResearcher
Registrar delivers data to
researcher
Registar extracts
data from Registry
Goal 2: Provide secure, auditable, access controlled mechanism to share data
Research Portal Advanced Features
• Modern technical approach (SaaS deployment)
• Federated Identity (each institution does authentication)
• Honest Brokerage with clinical data (local control)
• NAACCR Standard data (each institution can generate)
• Search via Globus (one search traverses institutions)
• Easy GUI to develop and use (includes faceted search)
• Easy search transmission (search “sentence” to registrar)
• Secure Data Liquidity (easy to move data to authorized)
Cancer Registry Records for Research (CR3)
Portal
Cancer Registry Records for Research (CR3)
Portal Feature Summary
• Portal code developed using Globus portal framework
(Django/python) based on Modern Research Data Portal
design pattern (PeerJ Articles:cs-144 https://peerj.com/articles/cs-144/)
• Federated logon using Globus Auth (GA) with Pitt/UPMC
as identity providers
• Cancer registry data ingested into Globus Search (GS)
• Aggregate charts (i.e., age range, gender)
• Google-like text search with collapsible facet panels for
filtering
CR3
Discovery
Portal
Cohort
aggregate
counts
Login with
UPMC/Pitt
credentials
Globus
Search (GS)
Globus
Auth (GA)
Elasticsearch
Cancer Registry De-identified Data Index
(minimal criteria data: e.g., staging)
UPMC/Pitt
Identity
Providers
Authentication
Auth
initiated
to GA
Cohort
search
initiated to
GS
Researcher
Cohort
aggregate
counts
returned
Email Request
sent to CR
CR3 Architecture v1.0
Registry Staff
(Globus Django Framework)
Summary of data stored in Globus Search Index
• Focused on 7 primary sites (i.e., ICD-O-3 codes)
• Breast (e.g., C500-C509), Colon, Lung, Skin, Head and Neck, Ovary, Prostate
• 30 common data elements (CDE) (based on North American Association of Central Cancer
Registries (NAACCR) standard and limited to Safe Harbor)
• Limit to 2 UPMC hospitals (UPMC operates 40 academic, community, and specialty
hospitals)
• UPMC Presbyterian Shadyside
• UPMC Magee-Womens Hospital
• Years 2004 -2017 (based on the availability of linking other clinical data from our Neptune data
warehouse in the future)
• Included Completed and Reportable cases only (NAACCR standards)
• Roughly 65,000 patients
Globus Search
Cancer
Registry DB
Extraction Pipeline into Globus Search
CSV
CSV
CSV
Globus Search
Transformation
Process
(python)
Cancer
Registry DB
*Data
extracts
Transformation
Process
(python)
Ingestion
Process
(python)
GMetaEntry
format
(Safe Harbor Data)
Cancer Registry (Honest Broker) Informatics (Honest Broker)
* Future version will generate/accept the NAACCR XML format
{
"@version": "2016-11-09",
"ingest_data": {
"@version": "2017-09-01",
"gmeta": [
{
"@version": "2017-09-01",
"subject": "https://dbmi.pitt.edu/subject/201201295:00:BRCA",
"visible_to": [
"urn:globus:groups:id:ddb13d88-d6df-11e8-9daa-0e368f3075e8"
],
"content": {
"clinical_t": "T2",
"mets_at_dx": "No distant metastasis",
"recurrence": "No",
"vitals": "Alive",
"clinical_n": "N0",
"race": "White",
"histology": "Infiltrating duct carcinoma, NOS",
"path_stage": "2A",
"facility": "Magee",
"histology_code": "85003",
"disease_category": "Breast",
"dx_age_range": "35-44",
"grade": "Grade II: Mod diff, mod well diff,",
"spanish_hispanic_origin": "Non-Spanish; non-Hispanic",
"year_last_contact": "2017",
"cs_stage": "2A",
"site": "Breast, NOS",
"cs_extension": "100",
"clinical_stage": "2A",
"cause_of_death": "Not applicable",
"path_n": "N0",
"path_t": "T2",
"cancer_status": "No evidence of this tumor",
"age_at_dx": "36",
"gender": "Female",
"summary_stage": "Localized",
"clinical_m": "M0",
"laterality": "Left - origin of primary",
"dx_year": "2012",
"site_code": "C509"
}
}
]
}
}
GMetaEntry data in GS
visible_to:
content:
(scope limited to defined
Globus Group)
Health Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by Globus
Long Term Goals
• Create a network of federated cancer registries
– Deploy similar infrastructure at other cancer registries
– Enable queries across multiple registries
– Model proven in ITCR TIES/TCRN project with independent installation
of TIES or connect a TIES installation to TCRN for multi-institutional
queries
• Federation via Globus allows network to scale while
control remains with each institution
– Data owners input and export data sets, apply quality control
– Data owners control all access policies
– No overhead from ingestion of disparate datasets
– Registry data remains at the location where it was generated
– Identities are provided and authenticated by the institution, not Globus
– System can scale almost infinitely if data owners provide their own
storage resources
CR3
Discovery
Portal
Cohort
aggregate
counts
Login with
UPMC/Pitt
credentials
Globus
Search (GS)
Globus
Auth (GA)
UPMC/Pitt
Identity
Providers
Authentication
Auth
initiated
to GA
Cohort
search
initiated to
GS
Researcher
Cohort
aggregate
counts
returned
CR3 Architecture v2.0
Globus
Transfer (GT)
Registry Staff
Data transfer from registrar to
researcher mediated by GT
Manage
authorization
Elasticsearch
Cancer Registry De-identified Data Index
(minimal criteria data: e.g., staging)
Request
Service
Questions?
j.c.s@pitt.edu
midavis@pitt.edu
1 of 28

Recommended

Data Sharing via Globus in the NIH Intramural Program by
Data Sharing via Globus in the NIH Intramural ProgramData Sharing via Globus in the NIH Intramural Program
Data Sharing via Globus in the NIH Intramural ProgramGlobus
630 views12 slides
Enabling Secure Data Discoverability (SC21 Tutorial) by
Enabling Secure Data Discoverability (SC21 Tutorial)Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Globus
106 views80 slides
Recent Upgrades to ARM Data Transfer and Delivery Using Globus by
Recent Upgrades to ARM Data Transfer and Delivery Using GlobusRecent Upgrades to ARM Data Transfer and Delivery Using Globus
Recent Upgrades to ARM Data Transfer and Delivery Using GlobusGlobus
379 views11 slides
SomeSlides by
SomeSlidesSomeSlides
SomeSlidesguestd60742
496 views40 slides
Grid Computing July 2009 by
Grid Computing July 2009Grid Computing July 2009
Grid Computing July 2009Ian Foster
1.6K views57 slides
Cenitpede: Analyzing Webcrawl by
Cenitpede: Analyzing WebcrawlCenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlPrimal Pappachan
3.3K views17 slides

More Related Content

What's hot

Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS by
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSEd Dodds
594 views13 slides
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ... by
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...Robert Meusel
1.8K views27 slides
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information by
balloon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Informationballoon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference InformationKai Schlegel
1.1K views26 slides
New Discovery Tools for Digital Humanities and Spatial Data (Summary of the J... by
New Discovery Tools for Digital Humanities and Spatial Data (Summary of the J...New Discovery Tools for Digital Humanities and Spatial Data (Summary of the J...
New Discovery Tools for Digital Humanities and Spatial Data (Summary of the J...Micah Altman
2K views45 slides
Search Joins with the Web - ICDT2014 Invited Lecture by
Search Joins with the Web - ICDT2014 Invited LectureSearch Joins with the Web - ICDT2014 Invited Lecture
Search Joins with the Web - ICDT2014 Invited LectureChris Bizer
3.5K views73 slides
Elephant in the Room: Scaling Storage for the HathiTrust Research Center by
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterElephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterRobert H. McDonald
1.4K views13 slides

What's hot(20)

Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS by Ed Dodds
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Ed Dodds594 views
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ... by Robert Meusel
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...
Robert Meusel1.8K views
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information by Kai Schlegel
balloon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Informationballoon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
Kai Schlegel1.1K views
New Discovery Tools for Digital Humanities and Spatial Data (Summary of the J... by Micah Altman
New Discovery Tools for Digital Humanities and Spatial Data (Summary of the J...New Discovery Tools for Digital Humanities and Spatial Data (Summary of the J...
New Discovery Tools for Digital Humanities and Spatial Data (Summary of the J...
Micah Altman2K views
Search Joins with the Web - ICDT2014 Invited Lecture by Chris Bizer
Search Joins with the Web - ICDT2014 Invited LectureSearch Joins with the Web - ICDT2014 Invited Lecture
Search Joins with the Web - ICDT2014 Invited Lecture
Chris Bizer3.5K views
Elephant in the Room: Scaling Storage for the HathiTrust Research Center by Robert H. McDonald
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterElephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
Robert H. McDonald1.4K views
Globus Integrations (GlobusWorld Tour - UMich) by Globus
Globus Integrations (GlobusWorld Tour - UMich)Globus Integrations (GlobusWorld Tour - UMich)
Globus Integrations (GlobusWorld Tour - UMich)
Globus 74 views
Adoption of the Linked Data Best Practices in Different Topical Domains by Chris Bizer
Adoption of the Linked Data Best Practices in Different Topical DomainsAdoption of the Linked Data Best Practices in Different Topical Domains
Adoption of the Linked Data Best Practices in Different Topical Domains
Chris Bizer3.1K views
DBpedia - An Interlinking Hub in the Web of Data by Chris Bizer
DBpedia - An Interlinking Hub in the Web of DataDBpedia - An Interlinking Hub in the Web of Data
DBpedia - An Interlinking Hub in the Web of Data
Chris Bizer2.5K views
CCCB Germline Variant Analysis on Cloud Platform by Yaoyu Wang
CCCB Germline Variant Analysis on Cloud PlatformCCCB Germline Variant Analysis on Cloud Platform
CCCB Germline Variant Analysis on Cloud Platform
Yaoyu Wang159 views
Discovering Related Data Sources in Data Portals by Peter Haase
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data Portals
Peter Haase1.5K views
2013 open analytics-meetup-mortar by Open Analytics
2013 open analytics-meetup-mortar2013 open analytics-meetup-mortar
2013 open analytics-meetup-mortar
Open Analytics1.6K views
Scaling collaborative data science with Globus and Jupyter by Ian Foster
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and Jupyter
Ian Foster809 views
Proposal for open government data by Mahmoud Jalajel
Proposal for open government dataProposal for open government data
Proposal for open government data
Mahmoud Jalajel171 views
Linked data experience at Macmillan: Building discovery services for scientif... by Michele Pasin
Linked data experience at Macmillan: Building discovery services for scientif...Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...
Michele Pasin1.4K views
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA... by Micah Altman
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
Micah Altman2.2K views
Honey on the Wire KohaCon18 by Joy Nelson
Honey on the Wire  KohaCon18Honey on the Wire  KohaCon18
Honey on the Wire KohaCon18
Joy Nelson97 views
Real-time Data Analytics mit Elasticsearch by inovex GmbH
Real-time Data Analytics mit ElasticsearchReal-time Data Analytics mit Elasticsearch
Real-time Data Analytics mit Elasticsearch
inovex GmbH1.3K views

Similar to Health Sciences Research Informatics, Powered by Globus

CPTAC Data Portal and Proteomics Data Commons by
CPTAC Data Portal and Proteomics Data CommonsCPTAC Data Portal and Proteomics Data Commons
CPTAC Data Portal and Proteomics Data Commonsimgcommcall
238 views28 slides
Data Harmonization for a Molecularly Driven Health System by
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
501 views69 slides
Overview of Next Gen Sequencing Data Analysis by
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisBioinformatics and Computational Biosciences Branch
6.5K views73 slides
cBioPortal Webinar Slides (2/3) by
cBioPortal Webinar Slides (2/3)cBioPortal Webinar Slides (2/3)
cBioPortal Webinar Slides (2/3)Pistoia Alliance
2.5K views38 slides
Enabling Discovery in High-Risk Plaque using Semantic Web Approaches by
Enabling Discovery in High-Risk Plaque using Semantic Web ApproachesEnabling Discovery in High-Risk Plaque using Semantic Web Approaches
Enabling Discovery in High-Risk Plaque using Semantic Web ApproachesTom Plasterer
810 views15 slides
Brisbane Health-y Data: RedCap by
Brisbane Health-y Data: RedCapBrisbane Health-y Data: RedCap
Brisbane Health-y Data: RedCapARDC
1.3K views15 slides

Similar to Health Sciences Research Informatics, Powered by Globus(20)

CPTAC Data Portal and Proteomics Data Commons by imgcommcall
CPTAC Data Portal and Proteomics Data CommonsCPTAC Data Portal and Proteomics Data Commons
CPTAC Data Portal and Proteomics Data Commons
imgcommcall238 views
Data Harmonization for a Molecularly Driven Health System by Warren Kibbe
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
Warren Kibbe501 views
Enabling Discovery in High-Risk Plaque using Semantic Web Approaches by Tom Plasterer
Enabling Discovery in High-Risk Plaque using Semantic Web ApproachesEnabling Discovery in High-Risk Plaque using Semantic Web Approaches
Enabling Discovery in High-Risk Plaque using Semantic Web Approaches
Tom Plasterer810 views
Brisbane Health-y Data: RedCap by ARDC
Brisbane Health-y Data: RedCapBrisbane Health-y Data: RedCap
Brisbane Health-y Data: RedCap
ARDC1.3K views
FAIR as a Working Principle for Cancer Genomic Data by Ian Fore
FAIR as a Working Principle for Cancer Genomic DataFAIR as a Working Principle for Cancer Genomic Data
FAIR as a Working Principle for Cancer Genomic Data
Ian Fore87 views
GlobusWorld 2019 Opening Keynote by Globus
GlobusWorld 2019 Opening KeynoteGlobusWorld 2019 Opening Keynote
GlobusWorld 2019 Opening Keynote
Globus 246 views
NCI Cancer Genomics, Open Science and PMI: FAIR by Warren Kibbe
NCI Cancer Genomics, Open Science and PMI: FAIR NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomics, Open Science and PMI: FAIR
Warren Kibbe975 views
Being FAIR: Enabling Reproducible Data Science by Carole Goble
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
Carole Goble1.2K views
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S... by confluent
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
confluent2.1K views
TCGA data coordination center: Carl Schaefer and Ari Kahn (NCICB) by niranabey
TCGA data coordination center: Carl Schaefer and Ari Kahn (NCICB)TCGA data coordination center: Carl Schaefer and Ari Kahn (NCICB)
TCGA data coordination center: Carl Schaefer and Ari Kahn (NCICB)
niranabey1.1K views
Company Overview Brochure_FMDKL by Kendall Smith
Company Overview Brochure_FMDKLCompany Overview Brochure_FMDKL
Company Overview Brochure_FMDKL
Kendall Smith202 views
Computational Pathology Workshop July 8 2014 by Joel Saltz
Computational Pathology Workshop July 8 2014Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014
Joel Saltz592 views
Peter Embi's 2011 AMIA CRI Year-in-Review by Peter Embi
Peter Embi's 2011 AMIA CRI Year-in-ReviewPeter Embi's 2011 AMIA CRI Year-in-Review
Peter Embi's 2011 AMIA CRI Year-in-Review
Peter Embi538 views
GlobusWorld 2020 Keynote by Globus
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 Keynote
Globus 434 views

More from Globus

Introduction to Globus for System Administrators by
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System AdministratorsGlobus
12 views55 slides
Introduction to Data Transfer and Sharing for Researchers by
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersGlobus
5 views33 slides
Introduction to the Globus Platform for Developers by
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersGlobus
4 views28 slides
Introduction to the Command Line Interface (CLI) by
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Globus
15 views12 slides
Automating Research Data with Globus Flows and Compute by
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeGlobus
9 views60 slides
Automating Research Data Flows and Introduction to the Globus Platform by
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformGlobus
50 views41 slides

More from Globus (20)

Introduction to Globus for System Administrators by Globus
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
Globus 12 views
Introduction to Data Transfer and Sharing for Researchers by Globus
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for Researchers
Globus 5 views
Introduction to the Globus Platform for Developers by Globus
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for Developers
Globus 4 views
Introduction to the Command Line Interface (CLI) by Globus
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)
Globus 15 views
Automating Research Data with Globus Flows and Compute by Globus
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and Compute
Globus 9 views
Automating Research Data Flows and Introduction to the Globus Platform by Globus
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus Platform
Globus 50 views
Advanced Globus System Administration by Globus
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
Globus 26 views
Introduction to Globus for System Administrators by Globus
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
Globus 96 views
Introduction to Globus for New Users by Globus
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
Globus 55 views
Working with Globus Platform Services and Portals by Globus
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and Portals
Globus 28 views
Globus Automation by Globus
Globus AutomationGlobus Automation
Globus Automation
Globus 23 views
Advanced Globus System Administration by Globus
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
Globus 21 views
Introduction to Globus by Globus
Introduction to GlobusIntroduction to Globus
Introduction to Globus
Globus 43 views
Introduction to Globus for System Administrators by Globus
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
Globus 27 views
Working with Globus Platform Services by Globus
Working with Globus Platform ServicesWorking with Globus Platform Services
Working with Globus Platform Services
Globus 42 views
Advanced Globus System Administration by Globus
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
Globus 29 views
Introduction to Globus for System Administrators by Globus
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
Globus 147 views
Using Globus to Streamline Research at Scale by Globus
Using Globus to Streamline Research at ScaleUsing Globus to Streamline Research at Scale
Using Globus to Streamline Research at Scale
Globus 30 views
Introduction to Globus for Researchers by Globus
Introduction to Globus for ResearchersIntroduction to Globus for Researchers
Introduction to Globus for Researchers
Globus 89 views
Automating Research Data Flows and an Introduction to the Globus Platform by Globus
Automating Research Data Flows and an Introduction to the Globus PlatformAutomating Research Data Flows and an Introduction to the Globus Platform
Automating Research Data Flows and an Introduction to the Globus Platform
Globus 132 views

Recently uploaded

Future of AR - Facebook Presentation by
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook PresentationRob McCarty
62 views27 slides
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O... by
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...ShapeBlue
88 views13 slides
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ... by
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...ShapeBlue
144 views12 slides
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...James Anderson
156 views32 slides
The Role of Patterns in the Era of Large Language Models by
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language ModelsYunyao Li
80 views65 slides
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... by
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...ShapeBlue
154 views62 slides

Recently uploaded(20)

Future of AR - Facebook Presentation by Rob McCarty
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
Rob McCarty62 views
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O... by ShapeBlue
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
ShapeBlue88 views
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ... by ShapeBlue
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
ShapeBlue144 views
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson156 views
The Role of Patterns in the Era of Large Language Models by Yunyao Li
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language Models
Yunyao Li80 views
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... by ShapeBlue
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
ShapeBlue154 views
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti... by ShapeBlue
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
ShapeBlue98 views
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ... by ShapeBlue
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
ShapeBlue85 views
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue by ShapeBlue
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlueMigrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue
ShapeBlue176 views
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue by ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueCloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
ShapeBlue93 views
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker50 views
Extending KVM Host HA for Non-NFS Storage - Alex Ivanov - StorPool by ShapeBlue
Extending KVM Host HA for Non-NFS Storage -  Alex Ivanov - StorPoolExtending KVM Host HA for Non-NFS Storage -  Alex Ivanov - StorPool
Extending KVM Host HA for Non-NFS Storage - Alex Ivanov - StorPool
ShapeBlue84 views
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue by ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueElevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
ShapeBlue179 views
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... by ShapeBlue
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
ShapeBlue79 views
Data Integrity for Banking and Financial Services by Precisely
Data Integrity for Banking and Financial ServicesData Integrity for Banking and Financial Services
Data Integrity for Banking and Financial Services
Precisely78 views
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ... by ShapeBlue
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...
ShapeBlue146 views
Digital Personal Data Protection (DPDP) Practical Approach For CISOs by Priyanka Aash
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Priyanka Aash153 views

Health Sciences Research Informatics, Powered by Globus

  • 1. Enabling Cancer Research with End-to-end Infrastructure for the Request, Exploration, and Receipt of Cancer Registry Data https://www.dbmi.pitt.edu/ https://rio.pitt.edu Jonathan C. Silverstein, MD, MS, FACS, FACMI Chief Research Informatics Officer Health Sciences and Institute for Precision Medicine Professor Department of Biomedical Informatics (DBMI), University of Pittsburgh Michael Davis, MS Software Integration Architect Department of Biomedical Informatics (DBMI), University of Pittsburgh
  • 2. Research Informatics Office (RIO.pitt.edu) • 25 staff in DBMI with mission “to support investigators through innovative collection and use of biomedical data” • Science-as-a-Service (scientists serving scientists) • Clinical Data Science Projects • Basic Data Science Projects • Translational Data Science Projects
  • 4. Neptune/R3 (w Shyam Visweswaran)
  • 5. HuBMAP Service Architecture (w Nick Nystrom)
  • 6. CR3 Collaborators • University of Pittsburgh – Biomedical Informatics (Jonathan Silverstein) – Institute for Precision Medicine (Adrian Lee) • UPMC – Hillman Cancer Center (Robert Ferris) – Network Cancer Registry (Sharon Winters) • University of Chicago – Globus (Ian Foster) NCI U24 CA209996, PI: Ian T. Foster
  • 7. CURRENT STATE: Step 1: Define desired data set Researcher talks to registrar about what kind of patient cohort they want. Registar queries the registry for the researchers. RegistrarResearcher Lengthy, iterative, labor-intensive process No mechanism for ad hoc exploration of the data by the researcher
  • 8. CURRENT STATE: Step 2: Deliver data to researcher Registrar delivers data to researcher Registar extracts data from Registry RegistrarResearcher Registrar delivers data via email, OneDrive, Sharepoint, mapped drive No tracking/auditing, error prone, insecure
  • 9. FUTURE STATE: Step 1: Define desired data set Goal 1: Provide mechanism for self service exploration of registry data Registar queries the registry for the researchers. Researchers queries Registry RegistrarResearcher CR3
  • 10. FUTURE STATE: Step 2: Deliver data to researcher RegistrarResearcher Registrar delivers data to researcher Registar extracts data from Registry Goal 2: Provide secure, auditable, access controlled mechanism to share data
  • 11. Research Portal Advanced Features • Modern technical approach (SaaS deployment) • Federated Identity (each institution does authentication) • Honest Brokerage with clinical data (local control) • NAACCR Standard data (each institution can generate) • Search via Globus (one search traverses institutions) • Easy GUI to develop and use (includes faceted search) • Easy search transmission (search “sentence” to registrar) • Secure Data Liquidity (easy to move data to authorized)
  • 12. Cancer Registry Records for Research (CR3) Portal
  • 13. Cancer Registry Records for Research (CR3) Portal Feature Summary • Portal code developed using Globus portal framework (Django/python) based on Modern Research Data Portal design pattern (PeerJ Articles:cs-144 https://peerj.com/articles/cs-144/) • Federated logon using Globus Auth (GA) with Pitt/UPMC as identity providers • Cancer registry data ingested into Globus Search (GS) • Aggregate charts (i.e., age range, gender) • Google-like text search with collapsible facet panels for filtering
  • 14. CR3 Discovery Portal Cohort aggregate counts Login with UPMC/Pitt credentials Globus Search (GS) Globus Auth (GA) Elasticsearch Cancer Registry De-identified Data Index (minimal criteria data: e.g., staging) UPMC/Pitt Identity Providers Authentication Auth initiated to GA Cohort search initiated to GS Researcher Cohort aggregate counts returned Email Request sent to CR CR3 Architecture v1.0 Registry Staff (Globus Django Framework)
  • 15. Summary of data stored in Globus Search Index • Focused on 7 primary sites (i.e., ICD-O-3 codes) • Breast (e.g., C500-C509), Colon, Lung, Skin, Head and Neck, Ovary, Prostate • 30 common data elements (CDE) (based on North American Association of Central Cancer Registries (NAACCR) standard and limited to Safe Harbor) • Limit to 2 UPMC hospitals (UPMC operates 40 academic, community, and specialty hospitals) • UPMC Presbyterian Shadyside • UPMC Magee-Womens Hospital • Years 2004 -2017 (based on the availability of linking other clinical data from our Neptune data warehouse in the future) • Included Completed and Reportable cases only (NAACCR standards) • Roughly 65,000 patients Globus Search Cancer Registry DB
  • 16. Extraction Pipeline into Globus Search CSV CSV CSV Globus Search Transformation Process (python) Cancer Registry DB *Data extracts Transformation Process (python) Ingestion Process (python) GMetaEntry format (Safe Harbor Data) Cancer Registry (Honest Broker) Informatics (Honest Broker) * Future version will generate/accept the NAACCR XML format
  • 17. { "@version": "2016-11-09", "ingest_data": { "@version": "2017-09-01", "gmeta": [ { "@version": "2017-09-01", "subject": "https://dbmi.pitt.edu/subject/201201295:00:BRCA", "visible_to": [ "urn:globus:groups:id:ddb13d88-d6df-11e8-9daa-0e368f3075e8" ], "content": { "clinical_t": "T2", "mets_at_dx": "No distant metastasis", "recurrence": "No", "vitals": "Alive", "clinical_n": "N0", "race": "White", "histology": "Infiltrating duct carcinoma, NOS", "path_stage": "2A", "facility": "Magee", "histology_code": "85003", "disease_category": "Breast", "dx_age_range": "35-44", "grade": "Grade II: Mod diff, mod well diff,", "spanish_hispanic_origin": "Non-Spanish; non-Hispanic", "year_last_contact": "2017", "cs_stage": "2A", "site": "Breast, NOS", "cs_extension": "100", "clinical_stage": "2A", "cause_of_death": "Not applicable", "path_n": "N0", "path_t": "T2", "cancer_status": "No evidence of this tumor", "age_at_dx": "36", "gender": "Female", "summary_stage": "Localized", "clinical_m": "M0", "laterality": "Left - origin of primary", "dx_year": "2012", "site_code": "C509" } } ] } } GMetaEntry data in GS visible_to: content: (scope limited to defined Globus Group)
  • 26. Long Term Goals • Create a network of federated cancer registries – Deploy similar infrastructure at other cancer registries – Enable queries across multiple registries – Model proven in ITCR TIES/TCRN project with independent installation of TIES or connect a TIES installation to TCRN for multi-institutional queries • Federation via Globus allows network to scale while control remains with each institution – Data owners input and export data sets, apply quality control – Data owners control all access policies – No overhead from ingestion of disparate datasets – Registry data remains at the location where it was generated – Identities are provided and authenticated by the institution, not Globus – System can scale almost infinitely if data owners provide their own storage resources
  • 27. CR3 Discovery Portal Cohort aggregate counts Login with UPMC/Pitt credentials Globus Search (GS) Globus Auth (GA) UPMC/Pitt Identity Providers Authentication Auth initiated to GA Cohort search initiated to GS Researcher Cohort aggregate counts returned CR3 Architecture v2.0 Globus Transfer (GT) Registry Staff Data transfer from registrar to researcher mediated by GT Manage authorization Elasticsearch Cancer Registry De-identified Data Index (minimal criteria data: e.g., staging) Request Service