SlideShare a Scribd company logo
1 of 28
Download to read offline
Enabling Cancer Research with End-to-end
Infrastructure for the Request, Exploration, and
Receipt of Cancer Registry Data
https://www.dbmi.pitt.edu/ https://rio.pitt.edu
Jonathan C. Silverstein, MD, MS, FACS, FACMI
Chief Research Informatics Officer
Health Sciences and Institute for Precision Medicine
Professor
Department of Biomedical Informatics (DBMI), University of Pittsburgh
Michael Davis, MS
Software Integration Architect
Department of Biomedical Informatics (DBMI), University of Pittsburgh
Research Informatics Office (RIO.pitt.edu)
• 25 staff in DBMI with mission “to support investigators
through innovative collection and use of biomedical data”
• Science-as-a-Service (scientists serving scientists)
• Clinical Data Science Projects
• Basic Data Science Projects
• Translational Data Science Projects
R3 Health Record
Research Request
Neptune/R3 (w Shyam Visweswaran)
HuBMAP Service Architecture (w Nick Nystrom)
CR3 Collaborators
• University of Pittsburgh
– Biomedical Informatics (Jonathan Silverstein)
– Institute for Precision Medicine (Adrian Lee)
• UPMC
– Hillman Cancer Center (Robert Ferris)
– Network Cancer Registry (Sharon Winters)
• University of Chicago
– Globus (Ian Foster)
NCI U24 CA209996, PI: Ian T. Foster
CURRENT STATE: Step 1: Define desired data set
Researcher talks to registrar
about what kind of patient
cohort they want.
Registar queries
the registry for the
researchers.
RegistrarResearcher
Lengthy, iterative, labor-intensive process
No mechanism for ad hoc exploration of the data by the researcher
CURRENT STATE: Step 2: Deliver data to researcher
Registrar delivers data to
researcher
Registar extracts
data from Registry
RegistrarResearcher
Registrar delivers data via email, OneDrive, Sharepoint, mapped drive
No tracking/auditing, error prone, insecure
FUTURE STATE: Step 1: Define desired data set
Goal 1: Provide mechanism for self service exploration of registry data
Registar queries
the registry for the
researchers.
Researchers queries Registry
RegistrarResearcher
CR3
FUTURE STATE: Step 2: Deliver data to researcher
RegistrarResearcher
Registrar delivers data to
researcher
Registar extracts
data from Registry
Goal 2: Provide secure, auditable, access controlled mechanism to share data
Research Portal Advanced Features
• Modern technical approach (SaaS deployment)
• Federated Identity (each institution does authentication)
• Honest Brokerage with clinical data (local control)
• NAACCR Standard data (each institution can generate)
• Search via Globus (one search traverses institutions)
• Easy GUI to develop and use (includes faceted search)
• Easy search transmission (search “sentence” to registrar)
• Secure Data Liquidity (easy to move data to authorized)
Cancer Registry Records for Research (CR3)
Portal
Cancer Registry Records for Research (CR3)
Portal Feature Summary
• Portal code developed using Globus portal framework
(Django/python) based on Modern Research Data Portal
design pattern (PeerJ Articles:cs-144 https://peerj.com/articles/cs-144/)
• Federated logon using Globus Auth (GA) with Pitt/UPMC
as identity providers
• Cancer registry data ingested into Globus Search (GS)
• Aggregate charts (i.e., age range, gender)
• Google-like text search with collapsible facet panels for
filtering
CR3
Discovery
Portal
Cohort
aggregate
counts
Login with
UPMC/Pitt
credentials
Globus
Search (GS)
Globus
Auth (GA)
Elasticsearch
Cancer Registry De-identified Data Index
(minimal criteria data: e.g., staging)
UPMC/Pitt
Identity
Providers
Authentication
Auth
initiated
to GA
Cohort
search
initiated to
GS
Researcher
Cohort
aggregate
counts
returned
Email Request
sent to CR
CR3 Architecture v1.0
Registry Staff
(Globus Django Framework)
Summary of data stored in Globus Search Index
• Focused on 7 primary sites (i.e., ICD-O-3 codes)
• Breast (e.g., C500-C509), Colon, Lung, Skin, Head and Neck, Ovary, Prostate
• 30 common data elements (CDE) (based on North American Association of Central Cancer
Registries (NAACCR) standard and limited to Safe Harbor)
• Limit to 2 UPMC hospitals (UPMC operates 40 academic, community, and specialty
hospitals)
• UPMC Presbyterian Shadyside
• UPMC Magee-Womens Hospital
• Years 2004 -2017 (based on the availability of linking other clinical data from our Neptune data
warehouse in the future)
• Included Completed and Reportable cases only (NAACCR standards)
• Roughly 65,000 patients
Globus Search
Cancer
Registry DB
Extraction Pipeline into Globus Search
CSV
CSV
CSV
Globus Search
Transformation
Process
(python)
Cancer
Registry DB
*Data
extracts
Transformation
Process
(python)
Ingestion
Process
(python)
GMetaEntry
format
(Safe Harbor Data)
Cancer Registry (Honest Broker) Informatics (Honest Broker)
* Future version will generate/accept the NAACCR XML format
{
"@version": "2016-11-09",
"ingest_data": {
"@version": "2017-09-01",
"gmeta": [
{
"@version": "2017-09-01",
"subject": "https://dbmi.pitt.edu/subject/201201295:00:BRCA",
"visible_to": [
"urn:globus:groups:id:ddb13d88-d6df-11e8-9daa-0e368f3075e8"
],
"content": {
"clinical_t": "T2",
"mets_at_dx": "No distant metastasis",
"recurrence": "No",
"vitals": "Alive",
"clinical_n": "N0",
"race": "White",
"histology": "Infiltrating duct carcinoma, NOS",
"path_stage": "2A",
"facility": "Magee",
"histology_code": "85003",
"disease_category": "Breast",
"dx_age_range": "35-44",
"grade": "Grade II: Mod diff, mod well diff,",
"spanish_hispanic_origin": "Non-Spanish; non-Hispanic",
"year_last_contact": "2017",
"cs_stage": "2A",
"site": "Breast, NOS",
"cs_extension": "100",
"clinical_stage": "2A",
"cause_of_death": "Not applicable",
"path_n": "N0",
"path_t": "T2",
"cancer_status": "No evidence of this tumor",
"age_at_dx": "36",
"gender": "Female",
"summary_stage": "Localized",
"clinical_m": "M0",
"laterality": "Left - origin of primary",
"dx_year": "2012",
"site_code": "C509"
}
}
]
}
}
GMetaEntry data in GS
visible_to:
content:
(scope limited to defined
Globus Group)
Long Term Goals
• Create a network of federated cancer registries
– Deploy similar infrastructure at other cancer registries
– Enable queries across multiple registries
– Model proven in ITCR TIES/TCRN project with independent installation
of TIES or connect a TIES installation to TCRN for multi-institutional
queries
• Federation via Globus allows network to scale while
control remains with each institution
– Data owners input and export data sets, apply quality control
– Data owners control all access policies
– No overhead from ingestion of disparate datasets
– Registry data remains at the location where it was generated
– Identities are provided and authenticated by the institution, not Globus
– System can scale almost infinitely if data owners provide their own
storage resources
CR3
Discovery
Portal
Cohort
aggregate
counts
Login with
UPMC/Pitt
credentials
Globus
Search (GS)
Globus
Auth (GA)
UPMC/Pitt
Identity
Providers
Authentication
Auth
initiated
to GA
Cohort
search
initiated to
GS
Researcher
Cohort
aggregate
counts
returned
CR3 Architecture v2.0
Globus
Transfer (GT)
Registry Staff
Data transfer from registrar to
researcher mediated by GT
Manage
authorization
Elasticsearch
Cancer Registry De-identified Data Index
(minimal criteria data: e.g., staging)
Request
Service
Questions?
j.c.s@pitt.edu
midavis@pitt.edu

More Related Content

What's hot

Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSEd Dodds
 
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...Robert Meusel
 
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Informationballoon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference InformationKai Schlegel
 
New Discovery Tools for Digital Humanities and Spatial Data (Summary of the J...
New Discovery Tools for Digital Humanities and Spatial Data (Summary of the J...New Discovery Tools for Digital Humanities and Spatial Data (Summary of the J...
New Discovery Tools for Digital Humanities and Spatial Data (Summary of the J...Micah Altman
 
Search Joins with the Web - ICDT2014 Invited Lecture
Search Joins with the Web - ICDT2014 Invited LectureSearch Joins with the Web - ICDT2014 Invited Lecture
Search Joins with the Web - ICDT2014 Invited LectureChris Bizer
 
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterElephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterRobert H. McDonald
 
Globus Integrations (GlobusWorld Tour - UMich)
Globus Integrations (GlobusWorld Tour - UMich)Globus Integrations (GlobusWorld Tour - UMich)
Globus Integrations (GlobusWorld Tour - UMich)Globus
 
Adoption of the Linked Data Best Practices in Different Topical Domains
Adoption of the Linked Data Best Practices in Different Topical DomainsAdoption of the Linked Data Best Practices in Different Topical Domains
Adoption of the Linked Data Best Practices in Different Topical DomainsChris Bizer
 
DBpedia - An Interlinking Hub in the Web of Data
DBpedia - An Interlinking Hub in the Web of DataDBpedia - An Interlinking Hub in the Web of Data
DBpedia - An Interlinking Hub in the Web of DataChris Bizer
 
CCCB Germline Variant Analysis on Cloud Platform
CCCB Germline Variant Analysis on Cloud PlatformCCCB Germline Variant Analysis on Cloud Platform
CCCB Germline Variant Analysis on Cloud PlatformYaoyu Wang
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsPeter Haase
 
2013 open analytics-meetup-mortar
2013 open analytics-meetup-mortar2013 open analytics-meetup-mortar
2013 open analytics-meetup-mortarOpen Analytics
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterIan Foster
 
Proposal for open government data
Proposal for open government dataProposal for open government data
Proposal for open government dataMahmoud Jalajel
 
Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Michele Pasin
 
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...Micah Altman
 
Honey on the Wire KohaCon18
Honey on the Wire  KohaCon18Honey on the Wire  KohaCon18
Honey on the Wire KohaCon18Joy Nelson
 
Real-time Data Analytics mit Elasticsearch
Real-time Data Analytics mit ElasticsearchReal-time Data Analytics mit Elasticsearch
Real-time Data Analytics mit Elasticsearchinovex GmbH
 

What's hot (20)

Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
 
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...
 
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Informationballoon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
 
New Discovery Tools for Digital Humanities and Spatial Data (Summary of the J...
New Discovery Tools for Digital Humanities and Spatial Data (Summary of the J...New Discovery Tools for Digital Humanities and Spatial Data (Summary of the J...
New Discovery Tools for Digital Humanities and Spatial Data (Summary of the J...
 
Search Joins with the Web - ICDT2014 Invited Lecture
Search Joins with the Web - ICDT2014 Invited LectureSearch Joins with the Web - ICDT2014 Invited Lecture
Search Joins with the Web - ICDT2014 Invited Lecture
 
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterElephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
 
Globus Integrations (GlobusWorld Tour - UMich)
Globus Integrations (GlobusWorld Tour - UMich)Globus Integrations (GlobusWorld Tour - UMich)
Globus Integrations (GlobusWorld Tour - UMich)
 
Adoption of the Linked Data Best Practices in Different Topical Domains
Adoption of the Linked Data Best Practices in Different Topical DomainsAdoption of the Linked Data Best Practices in Different Topical Domains
Adoption of the Linked Data Best Practices in Different Topical Domains
 
DBpedia - An Interlinking Hub in the Web of Data
DBpedia - An Interlinking Hub in the Web of DataDBpedia - An Interlinking Hub in the Web of Data
DBpedia - An Interlinking Hub in the Web of Data
 
CCCB Germline Variant Analysis on Cloud Platform
CCCB Germline Variant Analysis on Cloud PlatformCCCB Germline Variant Analysis on Cloud Platform
CCCB Germline Variant Analysis on Cloud Platform
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data Portals
 
2013 open analytics-meetup-mortar
2013 open analytics-meetup-mortar2013 open analytics-meetup-mortar
2013 open analytics-meetup-mortar
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and Jupyter
 
Proposal for open government data
Proposal for open government dataProposal for open government data
Proposal for open government data
 
Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...
 
Finding Data Sets
Finding Data SetsFinding Data Sets
Finding Data Sets
 
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
 
Honey on the Wire KohaCon18
Honey on the Wire  KohaCon18Honey on the Wire  KohaCon18
Honey on the Wire KohaCon18
 
Data Infrastructure in Kumparan
Data Infrastructure in KumparanData Infrastructure in Kumparan
Data Infrastructure in Kumparan
 
Real-time Data Analytics mit Elasticsearch
Real-time Data Analytics mit ElasticsearchReal-time Data Analytics mit Elasticsearch
Real-time Data Analytics mit Elasticsearch
 

Similar to Health Sciences Research Informatics, Powered by Globus

CPTAC Data Portal and Proteomics Data Commons
CPTAC Data Portal and Proteomics Data CommonsCPTAC Data Portal and Proteomics Data Commons
CPTAC Data Portal and Proteomics Data Commonsimgcommcall
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
cBioPortal Webinar Slides (2/3)
cBioPortal Webinar Slides (2/3)cBioPortal Webinar Slides (2/3)
cBioPortal Webinar Slides (2/3)Pistoia Alliance
 
Enabling Discovery in High-Risk Plaque using Semantic Web Approaches
Enabling Discovery in High-Risk Plaque using Semantic Web ApproachesEnabling Discovery in High-Risk Plaque using Semantic Web Approaches
Enabling Discovery in High-Risk Plaque using Semantic Web ApproachesTom Plasterer
 
Brisbane Health-y Data: RedCap
Brisbane Health-y Data: RedCapBrisbane Health-y Data: RedCap
Brisbane Health-y Data: RedCapARDC
 
FAIR as a Working Principle for Cancer Genomic Data
FAIR as a Working Principle for Cancer Genomic DataFAIR as a Working Principle for Cancer Genomic Data
FAIR as a Working Principle for Cancer Genomic DataIan Fore
 
GlobusWorld 2019 Opening Keynote
GlobusWorld 2019 Opening KeynoteGlobusWorld 2019 Opening Keynote
GlobusWorld 2019 Opening KeynoteGlobus
 
NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomics, Open Science and PMI: FAIR NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomics, Open Science and PMI: FAIR Warren Kibbe
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
 
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...confluent
 
TCGA data coordination center: Carl Schaefer and Ari Kahn (NCICB)
TCGA data coordination center: Carl Schaefer and Ari Kahn (NCICB)TCGA data coordination center: Carl Schaefer and Ari Kahn (NCICB)
TCGA data coordination center: Carl Schaefer and Ari Kahn (NCICB)niranabey
 
Company Overview Brochure_FMDKL
Company Overview Brochure_FMDKLCompany Overview Brochure_FMDKL
Company Overview Brochure_FMDKLKendall Smith
 
Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014Joel Saltz
 
Peter Embi's 2011 AMIA CRI Year-in-Review
Peter Embi's 2011 AMIA CRI Year-in-ReviewPeter Embi's 2011 AMIA CRI Year-in-Review
Peter Embi's 2011 AMIA CRI Year-in-ReviewPeter Embi
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobus
 

Similar to Health Sciences Research Informatics, Powered by Globus (20)

CPTAC Data Portal and Proteomics Data Commons
CPTAC Data Portal and Proteomics Data CommonsCPTAC Data Portal and Proteomics Data Commons
CPTAC Data Portal and Proteomics Data Commons
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
cBioPortal Webinar Slides (2/3)
cBioPortal Webinar Slides (2/3)cBioPortal Webinar Slides (2/3)
cBioPortal Webinar Slides (2/3)
 
Enabling Discovery in High-Risk Plaque using Semantic Web Approaches
Enabling Discovery in High-Risk Plaque using Semantic Web ApproachesEnabling Discovery in High-Risk Plaque using Semantic Web Approaches
Enabling Discovery in High-Risk Plaque using Semantic Web Approaches
 
Brisbane Health-y Data: RedCap
Brisbane Health-y Data: RedCapBrisbane Health-y Data: RedCap
Brisbane Health-y Data: RedCap
 
FAIR as a Working Principle for Cancer Genomic Data
FAIR as a Working Principle for Cancer Genomic DataFAIR as a Working Principle for Cancer Genomic Data
FAIR as a Working Principle for Cancer Genomic Data
 
GlobusWorld 2019 Opening Keynote
GlobusWorld 2019 Opening KeynoteGlobusWorld 2019 Opening Keynote
GlobusWorld 2019 Opening Keynote
 
NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomics, Open Science and PMI: FAIR NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomics, Open Science and PMI: FAIR
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
 
Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case
Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use caseEnabling simultaneous analysis of multiple cohort studies: A BRISSKit use case
Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case
 
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
 
TCGA data coordination center: Carl Schaefer and Ari Kahn (NCICB)
TCGA data coordination center: Carl Schaefer and Ari Kahn (NCICB)TCGA data coordination center: Carl Schaefer and Ari Kahn (NCICB)
TCGA data coordination center: Carl Schaefer and Ari Kahn (NCICB)
 
Ncicbiit
NcicbiitNcicbiit
Ncicbiit
 
Markham2009
Markham2009Markham2009
Markham2009
 
Company Overview Brochure_FMDKL
Company Overview Brochure_FMDKLCompany Overview Brochure_FMDKL
Company Overview Brochure_FMDKL
 
Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014
 
Peter Embi's 2011 AMIA CRI Year-in-Review
Peter Embi's 2011 AMIA CRI Year-in-ReviewPeter Embi's 2011 AMIA CRI Year-in-Review
Peter Embi's 2011 AMIA CRI Year-in-Review
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 Keynote
 
Big Data in Clinical Research
Big Data in Clinical ResearchBig Data in Clinical Research
Big Data in Clinical Research
 

More from Globus

Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration TopicsGlobus
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowGlobus
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaSGlobus
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesGlobus
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusGlobus
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for ResearchersGlobus
 
Introduction to Research Automation with Globus
Introduction to Research Automation with GlobusIntroduction to Research Automation with Globus
Introduction to Research Automation with GlobusGlobus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System AdministratorsGlobus
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System AdministratorsGlobus
 
Introduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersGlobus
 
Introduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersGlobus
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Globus
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeGlobus
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformGlobus
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System AdministrationGlobus
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System AdministratorsGlobus
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New UsersGlobus
 
Working with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsGlobus
 
Globus Automation
Globus AutomationGlobus Automation
Globus AutomationGlobus
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System AdministrationGlobus
 

More from Globus (20)

Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration Topics
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a Flow
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaS
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All Scales
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using Globus
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for Researchers
 
Introduction to Research Automation with Globus
Introduction to Research Automation with GlobusIntroduction to Research Automation with Globus
Introduction to Research Automation with Globus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System Administrators
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for Researchers
 
Introduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for Developers
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and Compute
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus Platform
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
 
Working with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and Portals
 
Globus Automation
Globus AutomationGlobus Automation
Globus Automation
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 

Recently uploaded

Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 

Recently uploaded (20)

Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 

Health Sciences Research Informatics, Powered by Globus

  • 1. Enabling Cancer Research with End-to-end Infrastructure for the Request, Exploration, and Receipt of Cancer Registry Data https://www.dbmi.pitt.edu/ https://rio.pitt.edu Jonathan C. Silverstein, MD, MS, FACS, FACMI Chief Research Informatics Officer Health Sciences and Institute for Precision Medicine Professor Department of Biomedical Informatics (DBMI), University of Pittsburgh Michael Davis, MS Software Integration Architect Department of Biomedical Informatics (DBMI), University of Pittsburgh
  • 2. Research Informatics Office (RIO.pitt.edu) • 25 staff in DBMI with mission “to support investigators through innovative collection and use of biomedical data” • Science-as-a-Service (scientists serving scientists) • Clinical Data Science Projects • Basic Data Science Projects • Translational Data Science Projects
  • 4. Neptune/R3 (w Shyam Visweswaran)
  • 5. HuBMAP Service Architecture (w Nick Nystrom)
  • 6. CR3 Collaborators • University of Pittsburgh – Biomedical Informatics (Jonathan Silverstein) – Institute for Precision Medicine (Adrian Lee) • UPMC – Hillman Cancer Center (Robert Ferris) – Network Cancer Registry (Sharon Winters) • University of Chicago – Globus (Ian Foster) NCI U24 CA209996, PI: Ian T. Foster
  • 7. CURRENT STATE: Step 1: Define desired data set Researcher talks to registrar about what kind of patient cohort they want. Registar queries the registry for the researchers. RegistrarResearcher Lengthy, iterative, labor-intensive process No mechanism for ad hoc exploration of the data by the researcher
  • 8. CURRENT STATE: Step 2: Deliver data to researcher Registrar delivers data to researcher Registar extracts data from Registry RegistrarResearcher Registrar delivers data via email, OneDrive, Sharepoint, mapped drive No tracking/auditing, error prone, insecure
  • 9. FUTURE STATE: Step 1: Define desired data set Goal 1: Provide mechanism for self service exploration of registry data Registar queries the registry for the researchers. Researchers queries Registry RegistrarResearcher CR3
  • 10. FUTURE STATE: Step 2: Deliver data to researcher RegistrarResearcher Registrar delivers data to researcher Registar extracts data from Registry Goal 2: Provide secure, auditable, access controlled mechanism to share data
  • 11. Research Portal Advanced Features • Modern technical approach (SaaS deployment) • Federated Identity (each institution does authentication) • Honest Brokerage with clinical data (local control) • NAACCR Standard data (each institution can generate) • Search via Globus (one search traverses institutions) • Easy GUI to develop and use (includes faceted search) • Easy search transmission (search “sentence” to registrar) • Secure Data Liquidity (easy to move data to authorized)
  • 12. Cancer Registry Records for Research (CR3) Portal
  • 13. Cancer Registry Records for Research (CR3) Portal Feature Summary • Portal code developed using Globus portal framework (Django/python) based on Modern Research Data Portal design pattern (PeerJ Articles:cs-144 https://peerj.com/articles/cs-144/) • Federated logon using Globus Auth (GA) with Pitt/UPMC as identity providers • Cancer registry data ingested into Globus Search (GS) • Aggregate charts (i.e., age range, gender) • Google-like text search with collapsible facet panels for filtering
  • 14. CR3 Discovery Portal Cohort aggregate counts Login with UPMC/Pitt credentials Globus Search (GS) Globus Auth (GA) Elasticsearch Cancer Registry De-identified Data Index (minimal criteria data: e.g., staging) UPMC/Pitt Identity Providers Authentication Auth initiated to GA Cohort search initiated to GS Researcher Cohort aggregate counts returned Email Request sent to CR CR3 Architecture v1.0 Registry Staff (Globus Django Framework)
  • 15. Summary of data stored in Globus Search Index • Focused on 7 primary sites (i.e., ICD-O-3 codes) • Breast (e.g., C500-C509), Colon, Lung, Skin, Head and Neck, Ovary, Prostate • 30 common data elements (CDE) (based on North American Association of Central Cancer Registries (NAACCR) standard and limited to Safe Harbor) • Limit to 2 UPMC hospitals (UPMC operates 40 academic, community, and specialty hospitals) • UPMC Presbyterian Shadyside • UPMC Magee-Womens Hospital • Years 2004 -2017 (based on the availability of linking other clinical data from our Neptune data warehouse in the future) • Included Completed and Reportable cases only (NAACCR standards) • Roughly 65,000 patients Globus Search Cancer Registry DB
  • 16. Extraction Pipeline into Globus Search CSV CSV CSV Globus Search Transformation Process (python) Cancer Registry DB *Data extracts Transformation Process (python) Ingestion Process (python) GMetaEntry format (Safe Harbor Data) Cancer Registry (Honest Broker) Informatics (Honest Broker) * Future version will generate/accept the NAACCR XML format
  • 17. { "@version": "2016-11-09", "ingest_data": { "@version": "2017-09-01", "gmeta": [ { "@version": "2017-09-01", "subject": "https://dbmi.pitt.edu/subject/201201295:00:BRCA", "visible_to": [ "urn:globus:groups:id:ddb13d88-d6df-11e8-9daa-0e368f3075e8" ], "content": { "clinical_t": "T2", "mets_at_dx": "No distant metastasis", "recurrence": "No", "vitals": "Alive", "clinical_n": "N0", "race": "White", "histology": "Infiltrating duct carcinoma, NOS", "path_stage": "2A", "facility": "Magee", "histology_code": "85003", "disease_category": "Breast", "dx_age_range": "35-44", "grade": "Grade II: Mod diff, mod well diff,", "spanish_hispanic_origin": "Non-Spanish; non-Hispanic", "year_last_contact": "2017", "cs_stage": "2A", "site": "Breast, NOS", "cs_extension": "100", "clinical_stage": "2A", "cause_of_death": "Not applicable", "path_n": "N0", "path_t": "T2", "cancer_status": "No evidence of this tumor", "age_at_dx": "36", "gender": "Female", "summary_stage": "Localized", "clinical_m": "M0", "laterality": "Left - origin of primary", "dx_year": "2012", "site_code": "C509" } } ] } } GMetaEntry data in GS visible_to: content: (scope limited to defined Globus Group)
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26. Long Term Goals • Create a network of federated cancer registries – Deploy similar infrastructure at other cancer registries – Enable queries across multiple registries – Model proven in ITCR TIES/TCRN project with independent installation of TIES or connect a TIES installation to TCRN for multi-institutional queries • Federation via Globus allows network to scale while control remains with each institution – Data owners input and export data sets, apply quality control – Data owners control all access policies – No overhead from ingestion of disparate datasets – Registry data remains at the location where it was generated – Identities are provided and authenticated by the institution, not Globus – System can scale almost infinitely if data owners provide their own storage resources
  • 27. CR3 Discovery Portal Cohort aggregate counts Login with UPMC/Pitt credentials Globus Search (GS) Globus Auth (GA) UPMC/Pitt Identity Providers Authentication Auth initiated to GA Cohort search initiated to GS Researcher Cohort aggregate counts returned CR3 Architecture v2.0 Globus Transfer (GT) Registry Staff Data transfer from registrar to researcher mediated by GT Manage authorization Elasticsearch Cancer Registry De-identified Data Index (minimal criteria data: e.g., staging) Request Service