SlideShare a Scribd company logo
APPROACHES TO COMBINING
SUPPLEMENTARY DATASETS ACROSS
MULTIPLE TRUSTED RESEARCH
ENVIRONMENTS USING FEDERATED
ANALYSIS
DR. ARAVIND SESAGIRI RAAMKUMAR
FEDERATED ANALYSIS (FA)
• Analysis of datasets or data sources that are present in different
geographic locations or networks
• Raw data generally not shared during analysis
• Statistical parameters or aggregated data are exchanged during
analysis
• The efficiency of FA projects is contingent on both communication
bandwidth and computational complexity
APPROACHES
• Linked Data Approach
• OMOP CDM Approach
• Custom API Approach
• Dedicated Environment Approach
• Hybrid Approach
Linked Data Approach
• Linked Data is based on Semantic
Web standards
• Data needs to be represented in the
Resource Description Framework
(RDF) format
• Each data item needs to be assigned a
URI (Uniform Resource Indicator)
• Relations and class hierarchy in the
datasets need to be represented
using Ontologies
• Data can be queried through SPARQL
endpoints
Linked Data Approach - Examples
https://dbpedia.org/page/Dementia
https://dbpedia.org/sparql
Linked Data Approach – Federated Querying
Example
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX movie: <http://data.linkedmdb.org/resource/movie/>
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT distinct ?name ?author ?filmname ?imdbID WHERE {
SERVICE <http://dbpedia.org/sparql> {
?book rdf:type dbo:Book .
?book foaf:name ?name .
?book dbp:author ?author .
?author foaf:name ?authname .
?book ^dbo:basedOn ?movie .
?movie a dbo:Film .
?movie foaf:name ?filmname
FILTER (str(?name) IN ("Royal Flash","White Oleander", "Possession: A Romance", "Misery", "Intensity", "The War of The Roses", "Momo", "The
Sicilian", "Derailed", "Ragtime"))
}
SERVICE <http://data.linkedmdb.org/sparql> {
?filmname foaf:page ?imdbID .
?filmname dc:title ?title .
FILTER(regex(str(?imdbID), "www.imdb.com" ) )
}
}
OMOP CDM Approach
• The Observational Medical Outcomes Partnership (OMOP) Common
Data Model (CDM) approach facilitates federated analysis using a
common data format
• Common data format includes concepts (terminologies), vocabularies,
and coding schemes
• The parent organization OHDSI provides a suite of tools for analysis
OMOP CDM Approach – CDM
OMOP CDM Approach – Tools
https://atlas-demo.ohdsi.org/
• Support for custom, OHDSI methods library in R and ATLAS
• Free tool for analyzing standardized, patient-level, observational data
CDM data
• Makes use of OHDSI WebAPI
• Supports Machine Learning
Custom API Approach
• Application Programming Interfaces (API) gateways can be setup to
serve as a black box over the data environment
• HTTP-based REST APIs along with an authentication key can be used
to pass specific data points over the internet
• Data usually shared via JSON, XML formats
• Example Link
Dedicated Environment Approach
• In this approach, the data environments are connected to the other
related data environments through a dedicated network or data access is
provided through a dedicated environment
• Special authentication is provided for remote access purposes
• Facilitated usually via a VPN (virtual private network) or directly through
internet
• Users access the data and tools using a Virtual Desktop Interface (VDI)
• Installation of special software might be required depending on the
setup
Comparison of Standalone Approaches
Approach Advantages Challenges and Barriers
Linked Data
Usage of open standards; Facilitates
inferencing and querying massive
public datasets
Data conversion from existing formats
will be a big effort, Periodic
object/ontology modeling for new
concepts; Dedicated training
OMOP CDM
Usage of a universally accepted
data format; Potential for
international collaborations
Data conversion from existing formats
might be a big effort; Might not be
useful if collaborators do not use the
same approach
Custom API
Convenient; Easiest to deploy;
Suitable for minimal data sharing
Ad-hoc querying would not be
possible; Large-scale data sharing
would be difficult
Dedicated
Environment
Safest in terms of data security;
Facilitates detailed auditing
Higher setup cost; Higher onboarding
time
Hybrid Approaches
• Certain or all aspects of standalone approaches can be combined to
form hybrid approaches
• Example 1: OMOP CDM with Dedicated Network where the health
organizations, government bodies, and academic institutes are
connected using common data standard, and a country-wide
dedicated network
• Partly implemented by National University Health System in
Singapore
• Example 2: Linked Data with API where REST APIs encapsulate the
querying complexities of SPARQL and RDF
Other Factors for Consideration
• Data anonymization
• Data versioning
• Provenance
• Security threats
• Standards and tools selection
THANK YOU

More Related Content

Similar to Approaches to combining supplementary datasets across multiple trusted research environments using federated analysis

Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapub
eswcsummerschool
 
Importance of data standards and system validation of software for clinical r...
Importance of data standards and system validation of software for clinical r...Importance of data standards and system validation of software for clinical r...
Importance of data standards and system validation of software for clinical r...
Wolfgang Kuchinke
 
Creating enterprise standards 09302010
Creating enterprise standards 09302010Creating enterprise standards 09302010
Creating enterprise standards 09302010
ERwin Modeling
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application Development
Peter Haase
 

Similar to Approaches to combining supplementary datasets across multiple trusted research environments using federated analysis (20)

Data accessibilityandchallenges
Data accessibilityandchallengesData accessibilityandchallenges
Data accessibilityandchallenges
 
Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapub
 
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsA BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
 
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
 
Ontologies for Emergency & Disaster Management
Ontologies for Emergency & Disaster Management Ontologies for Emergency & Disaster Management
Ontologies for Emergency & Disaster Management
 
ALIGNED Data Curation Methods and Tools
ALIGNED Data Curation Methods and ToolsALIGNED Data Curation Methods and Tools
ALIGNED Data Curation Methods and Tools
 
Starfish-A self tuning system for bigdata analytics
Starfish-A self tuning system for bigdata analyticsStarfish-A self tuning system for bigdata analytics
Starfish-A self tuning system for bigdata analytics
 
data analytics lecture3.ppt
data analytics lecture3.pptdata analytics lecture3.ppt
data analytics lecture3.ppt
 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData Management
 
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
 
Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014
 
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
 
Hybrid Cloud Journey - Maximizing Private and Public Cloud
Hybrid Cloud Journey - Maximizing Private and Public CloudHybrid Cloud Journey - Maximizing Private and Public Cloud
Hybrid Cloud Journey - Maximizing Private and Public Cloud
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2
 
Core Geospatial Ontologies
Core Geospatial OntologiesCore Geospatial Ontologies
Core Geospatial Ontologies
 
Data Domain-Driven Design
Data Domain-Driven DesignData Domain-Driven Design
Data Domain-Driven Design
 
Importance of data standards and system validation of software for clinical r...
Importance of data standards and system validation of software for clinical r...Importance of data standards and system validation of software for clinical r...
Importance of data standards and system validation of software for clinical r...
 
MEDIN data guidelines
MEDIN data guidelinesMEDIN data guidelines
MEDIN data guidelines
 
Creating enterprise standards 09302010
Creating enterprise standards 09302010Creating enterprise standards 09302010
Creating enterprise standards 09302010
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application Development
 

More from Aravind Sesagiri Raamkumar

More from Aravind Sesagiri Raamkumar (20)

Measuring the Outreach Efforts of Public Health Authorities and the Public Re...
Measuring the Outreach Efforts of Public Health Authorities and the Public Re...Measuring the Outreach Efforts of Public Health Authorities and the Public Re...
Measuring the Outreach Efforts of Public Health Authorities and the Public Re...
 
Understanding the Twitter Usage of Science Citation Index (SCI) Journals
Understanding the Twitter Usage of Science Citation Index (SCI) JournalsUnderstanding the Twitter Usage of Science Citation Index (SCI) Journals
Understanding the Twitter Usage of Science Citation Index (SCI) Journals
 
Investigating the Characteristics and Research Impact of Sentiments in Tweets...
Investigating the Characteristics and Research Impact of Sentiments in Tweets...Investigating the Characteristics and Research Impact of Sentiments in Tweets...
Investigating the Characteristics and Research Impact of Sentiments in Tweets...
 
Understanding the Twitter Usage of Humanities and Social Sciences Academic Jo...
Understanding the Twitter Usage of Humanities and Social Sciences Academic Jo...Understanding the Twitter Usage of Humanities and Social Sciences Academic Jo...
Understanding the Twitter Usage of Humanities and Social Sciences Academic Jo...
 
Multi-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender SystemsMulti-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender Systems
 
A task-based scientific paper recommender system for literature review and ma...
A task-based scientific paper recommender system for literature review and ma...A task-based scientific paper recommender system for literature review and ma...
A task-based scientific paper recommender system for literature review and ma...
 
Using altmetrics to support research evaluation
Using altmetrics to support research evaluationUsing altmetrics to support research evaluation
Using altmetrics to support research evaluation
 
Evolution and state-of-the art of Altmetric research: Insights from network a...
Evolution and state-of-the art of Altmetric research: Insights from network a...Evolution and state-of-the art of Altmetric research: Insights from network a...
Evolution and state-of-the art of Altmetric research: Insights from network a...
 
Feature Analysis of Research Metrics Systems
Feature Analysis of Research Metrics SystemsFeature Analysis of Research Metrics Systems
Feature Analysis of Research Metrics Systems
 
Scientometric Analysis of Research Performance of African Countries in select...
Scientometric Analysis of Research Performance of African Countries in select...Scientometric Analysis of Research Performance of African Countries in select...
Scientometric Analysis of Research Performance of African Countries in select...
 
New Dialog, New Services with Altmetrics: Lingnan University Library Experience
New Dialog, New Services with Altmetrics: Lingnan University Library ExperienceNew Dialog, New Services with Altmetrics: Lingnan University Library Experience
New Dialog, New Services with Altmetrics: Lingnan University Library Experience
 
Field-weighting readership: how does it compare to field-weighting citations?
Field-weighting readership: how does it compare to field-weighting citations?Field-weighting readership: how does it compare to field-weighting citations?
Field-weighting readership: how does it compare to field-weighting citations?
 
How do Scholars Evaluate and Promote Research Outputs? An NTU Case Study
How do Scholars Evaluate and Promote Research Outputs? An NTU Case StudyHow do Scholars Evaluate and Promote Research Outputs? An NTU Case Study
How do Scholars Evaluate and Promote Research Outputs? An NTU Case Study
 
Monitoring the broad impact of the journal publication output on country leve...
Monitoring the broad impact of the journal publication output on country leve...Monitoring the broad impact of the journal publication output on country leve...
Monitoring the broad impact of the journal publication output on country leve...
 
A Comparative Investigation on Citation Counts and Altmetrics between Papers ...
A Comparative Investigation on Citation Counts and Altmetrics between Papers ...A Comparative Investigation on Citation Counts and Altmetrics between Papers ...
A Comparative Investigation on Citation Counts and Altmetrics between Papers ...
 
Database-Centric Guidelines for Building a Scholarly Metrics Information Syst...
Database-Centric Guidelines for Building a Scholarly Metrics Information Syst...Database-Centric Guidelines for Building a Scholarly Metrics Information Syst...
Database-Centric Guidelines for Building a Scholarly Metrics Information Syst...
 
Altmetrics for Research Impact Actuation (ARIA)
Altmetrics for Research Impact Actuation (ARIA)Altmetrics for Research Impact Actuation (ARIA)
Altmetrics for Research Impact Actuation (ARIA)
 
Proposing a Scientific Paper Retrieval and Recommender Framework
Proposing a Scientific Paper Retrieval and Recommender FrameworkProposing a Scientific Paper Retrieval and Recommender Framework
Proposing a Scientific Paper Retrieval and Recommender Framework
 
What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...
 
What’s in a Country Name – Twitter Hashtag Analysis of #singapore
What’s in a Country Name – Twitter Hashtag Analysis of #singaporeWhat’s in a Country Name – Twitter Hashtag Analysis of #singapore
What’s in a Country Name – Twitter Hashtag Analysis of #singapore
 

Recently uploaded

Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Domenico Conte
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 

Recently uploaded (20)

Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 

Approaches to combining supplementary datasets across multiple trusted research environments using federated analysis

  • 1. APPROACHES TO COMBINING SUPPLEMENTARY DATASETS ACROSS MULTIPLE TRUSTED RESEARCH ENVIRONMENTS USING FEDERATED ANALYSIS DR. ARAVIND SESAGIRI RAAMKUMAR
  • 2. FEDERATED ANALYSIS (FA) • Analysis of datasets or data sources that are present in different geographic locations or networks • Raw data generally not shared during analysis • Statistical parameters or aggregated data are exchanged during analysis • The efficiency of FA projects is contingent on both communication bandwidth and computational complexity
  • 3. APPROACHES • Linked Data Approach • OMOP CDM Approach • Custom API Approach • Dedicated Environment Approach • Hybrid Approach
  • 4. Linked Data Approach • Linked Data is based on Semantic Web standards • Data needs to be represented in the Resource Description Framework (RDF) format • Each data item needs to be assigned a URI (Uniform Resource Indicator) • Relations and class hierarchy in the datasets need to be represented using Ontologies • Data can be queried through SPARQL endpoints
  • 5. Linked Data Approach - Examples https://dbpedia.org/page/Dementia https://dbpedia.org/sparql
  • 6. Linked Data Approach – Federated Querying Example PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX movie: <http://data.linkedmdb.org/resource/movie/> PREFIX dc: <http://purl.org/dc/terms/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX dbo: <http://dbpedia.org/ontology/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX dbp: <http://dbpedia.org/property/> SELECT distinct ?name ?author ?filmname ?imdbID WHERE { SERVICE <http://dbpedia.org/sparql> { ?book rdf:type dbo:Book . ?book foaf:name ?name . ?book dbp:author ?author . ?author foaf:name ?authname . ?book ^dbo:basedOn ?movie . ?movie a dbo:Film . ?movie foaf:name ?filmname FILTER (str(?name) IN ("Royal Flash","White Oleander", "Possession: A Romance", "Misery", "Intensity", "The War of The Roses", "Momo", "The Sicilian", "Derailed", "Ragtime")) } SERVICE <http://data.linkedmdb.org/sparql> { ?filmname foaf:page ?imdbID . ?filmname dc:title ?title . FILTER(regex(str(?imdbID), "www.imdb.com" ) ) } }
  • 7. OMOP CDM Approach • The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) approach facilitates federated analysis using a common data format • Common data format includes concepts (terminologies), vocabularies, and coding schemes • The parent organization OHDSI provides a suite of tools for analysis
  • 9. OMOP CDM Approach – Tools https://atlas-demo.ohdsi.org/ • Support for custom, OHDSI methods library in R and ATLAS • Free tool for analyzing standardized, patient-level, observational data CDM data • Makes use of OHDSI WebAPI • Supports Machine Learning
  • 10. Custom API Approach • Application Programming Interfaces (API) gateways can be setup to serve as a black box over the data environment • HTTP-based REST APIs along with an authentication key can be used to pass specific data points over the internet • Data usually shared via JSON, XML formats • Example Link
  • 11. Dedicated Environment Approach • In this approach, the data environments are connected to the other related data environments through a dedicated network or data access is provided through a dedicated environment • Special authentication is provided for remote access purposes • Facilitated usually via a VPN (virtual private network) or directly through internet • Users access the data and tools using a Virtual Desktop Interface (VDI) • Installation of special software might be required depending on the setup
  • 12. Comparison of Standalone Approaches Approach Advantages Challenges and Barriers Linked Data Usage of open standards; Facilitates inferencing and querying massive public datasets Data conversion from existing formats will be a big effort, Periodic object/ontology modeling for new concepts; Dedicated training OMOP CDM Usage of a universally accepted data format; Potential for international collaborations Data conversion from existing formats might be a big effort; Might not be useful if collaborators do not use the same approach Custom API Convenient; Easiest to deploy; Suitable for minimal data sharing Ad-hoc querying would not be possible; Large-scale data sharing would be difficult Dedicated Environment Safest in terms of data security; Facilitates detailed auditing Higher setup cost; Higher onboarding time
  • 13. Hybrid Approaches • Certain or all aspects of standalone approaches can be combined to form hybrid approaches • Example 1: OMOP CDM with Dedicated Network where the health organizations, government bodies, and academic institutes are connected using common data standard, and a country-wide dedicated network • Partly implemented by National University Health System in Singapore • Example 2: Linked Data with API where REST APIs encapsulate the querying complexities of SPARQL and RDF
  • 14. Other Factors for Consideration • Data anonymization • Data versioning • Provenance • Security threats • Standards and tools selection

Editor's Notes

  1. ICD – International Classification of Diseases SNOMED - Systematized Nomenclature of Medicine