2. FEDERATED ANALYSIS (FA)
• Analysis of datasets or data sources that are present in different
geographic locations or networks
• Raw data generally not shared during analysis
• Statistical parameters or aggregated data are exchanged during
analysis
• The efficiency of FA projects is contingent on both communication
bandwidth and computational complexity
4. Linked Data Approach
• Linked Data is based on Semantic
Web standards
• Data needs to be represented in the
Resource Description Framework
(RDF) format
• Each data item needs to be assigned a
URI (Uniform Resource Indicator)
• Relations and class hierarchy in the
datasets need to be represented
using Ontologies
• Data can be queried through SPARQL
endpoints
5. Linked Data Approach - Examples
https://dbpedia.org/page/Dementia
https://dbpedia.org/sparql
6. Linked Data Approach – Federated Querying
Example
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX movie: <http://data.linkedmdb.org/resource/movie/>
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT distinct ?name ?author ?filmname ?imdbID WHERE {
SERVICE <http://dbpedia.org/sparql> {
?book rdf:type dbo:Book .
?book foaf:name ?name .
?book dbp:author ?author .
?author foaf:name ?authname .
?book ^dbo:basedOn ?movie .
?movie a dbo:Film .
?movie foaf:name ?filmname
FILTER (str(?name) IN ("Royal Flash","White Oleander", "Possession: A Romance", "Misery", "Intensity", "The War of The Roses", "Momo", "The
Sicilian", "Derailed", "Ragtime"))
}
SERVICE <http://data.linkedmdb.org/sparql> {
?filmname foaf:page ?imdbID .
?filmname dc:title ?title .
FILTER(regex(str(?imdbID), "www.imdb.com" ) )
}
}
7. OMOP CDM Approach
• The Observational Medical Outcomes Partnership (OMOP) Common
Data Model (CDM) approach facilitates federated analysis using a
common data format
• Common data format includes concepts (terminologies), vocabularies,
and coding schemes
• The parent organization OHDSI provides a suite of tools for analysis
9. OMOP CDM Approach – Tools
https://atlas-demo.ohdsi.org/
• Support for custom, OHDSI methods library in R and ATLAS
• Free tool for analyzing standardized, patient-level, observational data
CDM data
• Makes use of OHDSI WebAPI
• Supports Machine Learning
10. Custom API Approach
• Application Programming Interfaces (API) gateways can be setup to
serve as a black box over the data environment
• HTTP-based REST APIs along with an authentication key can be used
to pass specific data points over the internet
• Data usually shared via JSON, XML formats
• Example Link
11. Dedicated Environment Approach
• In this approach, the data environments are connected to the other
related data environments through a dedicated network or data access is
provided through a dedicated environment
• Special authentication is provided for remote access purposes
• Facilitated usually via a VPN (virtual private network) or directly through
internet
• Users access the data and tools using a Virtual Desktop Interface (VDI)
• Installation of special software might be required depending on the
setup
12. Comparison of Standalone Approaches
Approach Advantages Challenges and Barriers
Linked Data
Usage of open standards; Facilitates
inferencing and querying massive
public datasets
Data conversion from existing formats
will be a big effort, Periodic
object/ontology modeling for new
concepts; Dedicated training
OMOP CDM
Usage of a universally accepted
data format; Potential for
international collaborations
Data conversion from existing formats
might be a big effort; Might not be
useful if collaborators do not use the
same approach
Custom API
Convenient; Easiest to deploy;
Suitable for minimal data sharing
Ad-hoc querying would not be
possible; Large-scale data sharing
would be difficult
Dedicated
Environment
Safest in terms of data security;
Facilitates detailed auditing
Higher setup cost; Higher onboarding
time
13. Hybrid Approaches
• Certain or all aspects of standalone approaches can be combined to
form hybrid approaches
• Example 1: OMOP CDM with Dedicated Network where the health
organizations, government bodies, and academic institutes are
connected using common data standard, and a country-wide
dedicated network
• Partly implemented by National University Health System in
Singapore
• Example 2: Linked Data with API where REST APIs encapsulate the
querying complexities of SPARQL and RDF
14. Other Factors for Consideration
• Data anonymization
• Data versioning
• Provenance
• Security threats
• Standards and tools selection