SlideShare a Scribd company logo
1/23
Pradeeban Kathiravelu1,2
, Yiru Chen3
, Ashish Sharma4
,
Helena Galhardas1
, Peter Van Roy2
, Luís Veiga1
On-Demand Service-Based
Big Data Integration:
Optimized for Research Collaboration
The 3rd
International Workshop on Data Management and Analytics for Medicine and Healthcare (DMAH),
in conjunction with the 43rd International Conference on Very Large Data Bases.
Munich, Germany. September 1, 2017.
1
INESC-ID / Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
2
Université catholique de Louvain, Louvain-la-Neuve, Belgium
3
Peking University, Beijing, China
4
Department of Biomedical Informatics, Emory University, Atlanta, USA
2/23
Introduction
●
Scale and diversity of big data are rising.
–
Geographically distributed data of exabytes.
–
Structured, semi-structured, unstructured, or ill-formed data.
●
Integration of data is crucial for data science.
●
Sharing of integrated data and results.
–
Mandatory for reproducible research.
3/23
Challenges in Medical Research
for Big Data Integration
●
Multiple types of data.
–
Imaging, clinical, and genomic.
●
Numerous data sources.
–
No shared messaging protocol.
●
Do we really need to integrate all the data?
4/23
A Story of Medical Data Researchers...A Story of Medical Data Researchers...
5/23
●
Jim is interested in the
effects of a medicine to
treat brain tumor in patients
of certain age groups.
6/23
Observation - 1
●
Various sources.
–
Service-based data access through APIs.
●
Thanks to specifications such as HL7 FHIR.
●
The researchers possess domain knowledge.
●
Integrate On-Demand.
–
Avoid eager loading of binary data or its textual metadata.
–
Use the researcher query as an input in loading data.
●
Scalable storage in-house.
–
Potential to load, integrate, index, and query unstructured data.
7/23
●
Paula has overlapping
research interests with Jim.
8/23
Observation - 2
●
Load data only once per organization.
–
Bandwidth and storage efficiency.
9/23
●
Sharing the research data with researchers,
beyond organization boundaries.
10/23
Observation - 3
●
Do not duplicate data!
–
We ``own`` our interest; not the data.
●
Point to the data in the data sources.
–
Pointers to data like Dropbox Shared Links work well.
●
Avoids outdated duplicate data.
●
Easy to maintain.
●
APIs – Access the list of research data sets.
11/23
Problems
●
How to..
–
Load data from several service-based big data sources.
●
Avoid duplicate downloads and near duplicate data.
–
Integrate disparate data and persist for future accesses.
–
Share pointers to data internally and externally.
12/23
Óbidos
OOn-demand BBig Data IIntegration,
DDistribution, and OOrchestration SSystem
●
Researcher query →
Narrow down the search space.
●
Define subsets of data that are
of interest.
–
Exploiting the well-defined
hierarchical structure of medical data.
●
Medical Images (DICOM)
●
Clinical data
●
..
13/23
Óbidos Approach
●
Hybrid of virtual and materialized data integration
approaches.
–
Lazy load of metadata: Load the matching subset of metadata.
–
Store integrated data and query results → scalable storage.
●
Track already loaded data.
–
Near duplicate detection.
–
Download only updates (changesets).
●
Efficient SQL queries on NoSQL storage.
●
Share pointers to the datasets rather than the dataset itself.
●
Generic design; implementation for medical research data.Generic design; implementation for medical research data.
14/23
Óbidos Architecture
15/23
Evaluation
●
Evaluation Data:
–
Clinical data and DICOM imaging collections of TCIA.
●
Benchmark Óbidos against eager and lazy ETL.
–
Performance of loading and querying data.
●
Óbidos (inter- and intra- organization) against binary data sharing.
–
Space/bandwidth efficiency of data sharing.
16/23
Workload Characterization
Various Entries in Evaluated Collections
17/23
Data load time
Change in total data volume (Same query and same interest)
●
Observation:
–
Load time increases for eager and lazy ETL with total volume.
–
Load time for Óbidos remains constant.
●
Total volume of data is irrelevant for Óbidos.
18/23
Change in studies of interest
(Same query and constant total data volume)
Data load time
●
Observation:
–
Load time for eager and lazy ETL remains constant.
–
Load time increases for Óbidos with the interest.
●
Converges to the load time of lazy ETL.
19/23
Query completion time
for the integrated data repository
●
Observation:
–
We assume the corresponding data is already loaded.
●
Thus, lazy and eager ETL perform similar.
–
Indexed scalable NoSQL architecture of Óbidos → Better performance.
20/23
Efficiency in Sharing Medical Research Data
●
Observation:
–
A constant-size UID is sufficient, intra-organization.
–
With number of series, Óbidos pointers grow, inter-organization.
–
Traditional binary data sharing:
shared data size = volume of the image series.
21/23
Conclusion
●
Óbidos offers on-demand service-based big data integration.
–
Fast and resource-efficient data analysis.
–
SQL queries over NoSQL data store for the integrated data.
–
Efficient data sharing without duplicating actual data.
●
Future Work
–
Consume data from repositories of domains beyond medical data.
●
EUDAT
–
Óbidos distributed virtual data warehouses.
●
Leverage the proximity of the organizations in data integration and sharing.
22/23
Acknowledgements
●
NCI QIN grant (1U01CA187013, Resources for
Development and Validation Of Radiomic Analyses and
Adaptive Therapy).
●
Google Summer of Code (2014, 2015, and 2016).
●
The Cancer Imaging Archive (TCIA).
●
Tyk and API Umbrella Teams.
23/23
Conclusion
●
Óbidos offers on-demand service-based big data integration.
–
Fast and resource-efficient data analysis.
–
SQL queries over NoSQL data store for the integrated data.
–
Efficient data sharing without duplicating actual data.
●
Future Work
–
Consume data from repositories of domains beyond medical data.
●
EUDAT
–
Óbidos distributed virtual data warehouses.
●
Leverage the proximity of the organizations in data integration and sharing.
Thank you!
Questions?

More Related Content

What's hot

Mit401 data warehousing and data mining
Mit401  data warehousing and data miningMit401  data warehousing and data mining
Mit401 data warehousing and data mining
smumbahelp
 
Final presentation
Final presentationFinal presentation
Final presentation
Dave Nawazish Ali
 
Record matching over query results from Web Databases
Record matching over query results from Web DatabasesRecord matching over query results from Web Databases
Record matching over query results from Web Databases
tusharjadhav2611
 
FAIR sequencing data repository based on iRODS
FAIR sequencing data repository based on iRODSFAIR sequencing data repository based on iRODS
FAIR sequencing data repository based on iRODS
Felipe Gutierrez
 
Influence of-structured--semi-structured--unstructured-data-on-various-data-m...
Influence of-structured--semi-structured--unstructured-data-on-various-data-m...Influence of-structured--semi-structured--unstructured-data-on-various-data-m...
Influence of-structured--semi-structured--unstructured-data-on-various-data-m...
shivz3
 
3 dw architectures
3 dw architectures3 dw architectures
3 dw architectures
Claudia Gomez
 
Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning
Matteo Manca
 
EDI Training Module 11: Publishing Data in the EDI Repository
EDI Training Module 11:  Publishing Data in the EDI RepositoryEDI Training Module 11:  Publishing Data in the EDI Repository
EDI Training Module 11: Publishing Data in the EDI Repository
Environmental Data Initiative
 
Introduction to using REDCap for multi-site longitudinal research in medicine
Introduction to using REDCap for multi-site longitudinal research in medicineIntroduction to using REDCap for multi-site longitudinal research in medicine
Introduction to using REDCap for multi-site longitudinal research in medicine
Brian T. Edwards
 
Introduction to the Environmental Data Initiative (EDI)
Introduction to the Environmental Data Initiative (EDI)Introduction to the Environmental Data Initiative (EDI)
Introduction to the Environmental Data Initiative (EDI)
Corinna Gries
 
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)Gregor Hagedorn
 
Ehr models, standards and semantic interoperability
Ehr models, standards and semantic interoperabilityEhr models, standards and semantic interoperability
Ehr models, standards and semantic interoperability
David Moner Cano
 
Data cloud lab version v.001.2020
Data cloud lab version v.001.2020Data cloud lab version v.001.2020
Data cloud lab version v.001.2020
mdcdwh
 
Archetype-based data transformation with LinkEHR
Archetype-based data transformation with LinkEHRArchetype-based data transformation with LinkEHR
Archetype-based data transformation with LinkEHR
David Moner Cano
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with Dataverse
Anita de Waard
 
Supporting Big Data, Open Data, Data Analytics and Data Science
Supporting Big Data, Open Data, Data Analytics and Data ScienceSupporting Big Data, Open Data, Data Analytics and Data Science
Supporting Big Data, Open Data, Data Analytics and Data Science
Simon Price
 
Types of databases
Types of databasesTypes of databases
Types of databasesPAQUIAAIZEL
 
iRODS User Group Meeting 2016 - MUMC+
iRODS User Group Meeting 2016 - MUMC+iRODS User Group Meeting 2016 - MUMC+
iRODS User Group Meeting 2016 - MUMC+
Maarten Coonen
 
EPSRC Policy Compliance: What researchers need to know
EPSRC Policy Compliance: What researchers need to knowEPSRC Policy Compliance: What researchers need to know
EPSRC Policy Compliance: What researchers need to know
Historic Environment Scotland
 
Authors' and Publications' Citations knowledge base
Authors' and Publications' Citations knowledge base Authors' and Publications' Citations knowledge base
Authors' and Publications' Citations knowledge base
Leila Zemmouchi-Ghomari
 

What's hot (20)

Mit401 data warehousing and data mining
Mit401  data warehousing and data miningMit401  data warehousing and data mining
Mit401 data warehousing and data mining
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Record matching over query results from Web Databases
Record matching over query results from Web DatabasesRecord matching over query results from Web Databases
Record matching over query results from Web Databases
 
FAIR sequencing data repository based on iRODS
FAIR sequencing data repository based on iRODSFAIR sequencing data repository based on iRODS
FAIR sequencing data repository based on iRODS
 
Influence of-structured--semi-structured--unstructured-data-on-various-data-m...
Influence of-structured--semi-structured--unstructured-data-on-various-data-m...Influence of-structured--semi-structured--unstructured-data-on-various-data-m...
Influence of-structured--semi-structured--unstructured-data-on-various-data-m...
 
3 dw architectures
3 dw architectures3 dw architectures
3 dw architectures
 
Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning
 
EDI Training Module 11: Publishing Data in the EDI Repository
EDI Training Module 11:  Publishing Data in the EDI RepositoryEDI Training Module 11:  Publishing Data in the EDI Repository
EDI Training Module 11: Publishing Data in the EDI Repository
 
Introduction to using REDCap for multi-site longitudinal research in medicine
Introduction to using REDCap for multi-site longitudinal research in medicineIntroduction to using REDCap for multi-site longitudinal research in medicine
Introduction to using REDCap for multi-site longitudinal research in medicine
 
Introduction to the Environmental Data Initiative (EDI)
Introduction to the Environmental Data Initiative (EDI)Introduction to the Environmental Data Initiative (EDI)
Introduction to the Environmental Data Initiative (EDI)
 
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
 
Ehr models, standards and semantic interoperability
Ehr models, standards and semantic interoperabilityEhr models, standards and semantic interoperability
Ehr models, standards and semantic interoperability
 
Data cloud lab version v.001.2020
Data cloud lab version v.001.2020Data cloud lab version v.001.2020
Data cloud lab version v.001.2020
 
Archetype-based data transformation with LinkEHR
Archetype-based data transformation with LinkEHRArchetype-based data transformation with LinkEHR
Archetype-based data transformation with LinkEHR
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with Dataverse
 
Supporting Big Data, Open Data, Data Analytics and Data Science
Supporting Big Data, Open Data, Data Analytics and Data ScienceSupporting Big Data, Open Data, Data Analytics and Data Science
Supporting Big Data, Open Data, Data Analytics and Data Science
 
Types of databases
Types of databasesTypes of databases
Types of databases
 
iRODS User Group Meeting 2016 - MUMC+
iRODS User Group Meeting 2016 - MUMC+iRODS User Group Meeting 2016 - MUMC+
iRODS User Group Meeting 2016 - MUMC+
 
EPSRC Policy Compliance: What researchers need to know
EPSRC Policy Compliance: What researchers need to knowEPSRC Policy Compliance: What researchers need to know
EPSRC Policy Compliance: What researchers need to know
 
Authors' and Publications' Citations knowledge base
Authors' and Publications' Citations knowledge base Authors' and Publications' Citations knowledge base
Authors' and Publications' Citations knowledge base
 

Similar to On-Demand Service-Based Big Data Integration: Optimized for Research Collaboration

Recognising data sharing
Recognising data sharingRecognising data sharing
Recognising data sharing
Jisc RDM
 
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET
 
Alain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producersAlain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producersIncisive_Events
 
Data Science BD2K Update for NIH
Data Science BD2K Update for NIH Data Science BD2K Update for NIH
Data Science BD2K Update for NIH
Philip Bourne
 
Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019
Kees van Bochove
 
Unit 3.pdf
Unit 3.pdfUnit 3.pdf
Unit 3.pdf
Manisha Shinde
 
Data Governance in two different data archives: When is a federal data reposi...
Data Governance in two different data archives: When is a federal data reposi...Data Governance in two different data archives: When is a federal data reposi...
Data Governance in two different data archives: When is a federal data reposi...
Carolyn Ten Holter
 
From Data Sharing to Data Stewardship
From Data Sharing to Data StewardshipFrom Data Sharing to Data Stewardship
From Data Sharing to Data Stewardship
ICPSR
 
Data discovery and sharing at UCLH
Data discovery and sharing at UCLHData discovery and sharing at UCLH
Data discovery and sharing at UCLH
Jisc
 
EDI Training Module 4: Organizing Data Into Publishable Units
EDI Training Module 4: Organizing Data Into Publishable UnitsEDI Training Module 4: Organizing Data Into Publishable Units
EDI Training Module 4: Organizing Data Into Publishable Units
Environmental Data Initiative
 
How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...
ariadnenetwork
 
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
CILIP MDG
 
Shifting the goal post – from high impact journals to high impact data
 Shifting the goal post – from high impact journals to high impact data Shifting the goal post – from high impact journals to high impact data
Shifting the goal post – from high impact journals to high impact data
CGIAR Research Program on Dryland Systems
 
AstraZeneca at Neo4j GraphSummit London 14Nov23.pptx
AstraZeneca at Neo4j GraphSummit London 14Nov23.pptxAstraZeneca at Neo4j GraphSummit London 14Nov23.pptx
AstraZeneca at Neo4j GraphSummit London 14Nov23.pptx
Neo4j
 
Managing, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital EnvironmentManaging, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital Environment
philipdurbin
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
Robert Grossman
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
Warren Kibbe
 
Big Data in Clinical Research
Big Data in Clinical ResearchBig Data in Clinical Research
Big Data in Clinical Research
Mike Hogarth, MD, FACMI, FACP
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?
Anita de Waard
 

Similar to On-Demand Service-Based Big Data Integration: Optimized for Research Collaboration (20)

Recognising data sharing
Recognising data sharingRecognising data sharing
Recognising data sharing
 
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
 
Simon hodson
Simon hodsonSimon hodson
Simon hodson
 
Alain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producersAlain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producers
 
Data Science BD2K Update for NIH
Data Science BD2K Update for NIH Data Science BD2K Update for NIH
Data Science BD2K Update for NIH
 
Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019
 
Unit 3.pdf
Unit 3.pdfUnit 3.pdf
Unit 3.pdf
 
Data Governance in two different data archives: When is a federal data reposi...
Data Governance in two different data archives: When is a federal data reposi...Data Governance in two different data archives: When is a federal data reposi...
Data Governance in two different data archives: When is a federal data reposi...
 
From Data Sharing to Data Stewardship
From Data Sharing to Data StewardshipFrom Data Sharing to Data Stewardship
From Data Sharing to Data Stewardship
 
Data discovery and sharing at UCLH
Data discovery and sharing at UCLHData discovery and sharing at UCLH
Data discovery and sharing at UCLH
 
EDI Training Module 4: Organizing Data Into Publishable Units
EDI Training Module 4: Organizing Data Into Publishable UnitsEDI Training Module 4: Organizing Data Into Publishable Units
EDI Training Module 4: Organizing Data Into Publishable Units
 
How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...
 
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
 
Shifting the goal post – from high impact journals to high impact data
 Shifting the goal post – from high impact journals to high impact data Shifting the goal post – from high impact journals to high impact data
Shifting the goal post – from high impact journals to high impact data
 
AstraZeneca at Neo4j GraphSummit London 14Nov23.pptx
AstraZeneca at Neo4j GraphSummit London 14Nov23.pptxAstraZeneca at Neo4j GraphSummit London 14Nov23.pptx
AstraZeneca at Neo4j GraphSummit London 14Nov23.pptx
 
Managing, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital EnvironmentManaging, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital Environment
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Big Data in Clinical Research
Big Data in Clinical ResearchBig Data in Clinical Research
Big Data in Clinical Research
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?
 

More from Pradeeban Kathiravelu, Ph.D.

Google Summer of Code_2023.pdf
Google Summer of Code_2023.pdfGoogle Summer of Code_2023.pdf
Google Summer of Code_2023.pdf
Pradeeban Kathiravelu, Ph.D.
 
Google Summer of Code (GSoC) 2022
Google Summer of Code (GSoC) 2022Google Summer of Code (GSoC) 2022
Google Summer of Code (GSoC) 2022
Pradeeban Kathiravelu, Ph.D.
 
Google Summer of Code (GSoC) 2022
Google Summer of Code (GSoC) 2022Google Summer of Code (GSoC) 2022
Google Summer of Code (GSoC) 2022
Pradeeban Kathiravelu, Ph.D.
 
Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.
Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.
Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.
Pradeeban Kathiravelu, Ph.D.
 
Google summer of code (GSoC) 2021
Google summer of code (GSoC) 2021Google summer of code (GSoC) 2021
Google summer of code (GSoC) 2021
Pradeeban Kathiravelu, Ph.D.
 
A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology ...
A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology ...A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology ...
A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology ...
Pradeeban Kathiravelu, Ph.D.
 
Google Summer of Code (GSoC) 2020 for mentors
Google Summer of Code (GSoC) 2020 for mentorsGoogle Summer of Code (GSoC) 2020 for mentors
Google Summer of Code (GSoC) 2020 for mentors
Pradeeban Kathiravelu, Ph.D.
 
Google Summer of Code (GSoC) 2020
Google Summer of Code (GSoC) 2020Google Summer of Code (GSoC) 2020
Google Summer of Code (GSoC) 2020
Pradeeban Kathiravelu, Ph.D.
 
Data Services with Bindaas: RESTful Interfaces for Diverse Data Sources
Data Services with Bindaas: RESTful Interfaces for Diverse Data SourcesData Services with Bindaas: RESTful Interfaces for Diverse Data Sources
Data Services with Bindaas: RESTful Interfaces for Diverse Data Sources
Pradeeban Kathiravelu, Ph.D.
 
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degreeThe UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
Pradeeban Kathiravelu, Ph.D.
 
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Compos...
 My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Compos... My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Compos...
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Compos...
Pradeeban Kathiravelu, Ph.D.
 
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...
Pradeeban Kathiravelu, Ph.D.
 
UCL Ph.D. Confirmation 2018
UCL Ph.D. Confirmation 2018UCL Ph.D. Confirmation 2018
UCL Ph.D. Confirmation 2018
Pradeeban Kathiravelu, Ph.D.
 
Software-Defined Systems for Network-Aware Service Composition and Workflow P...
Software-Defined Systems for Network-Aware Service Composition and Workflow P...Software-Defined Systems for Network-Aware Service Composition and Workflow P...
Software-Defined Systems for Network-Aware Service Composition and Workflow P...
Pradeeban Kathiravelu, Ph.D.
 
Moving bits with a fleet of shared virtual routers
Moving bits with a fleet of shared virtual routersMoving bits with a fleet of shared virtual routers
Moving bits with a fleet of shared virtual routers
Pradeeban Kathiravelu, Ph.D.
 
Software-Defined Data Services: Interoperable and Network-Aware Big Data Exec...
Software-Defined Data Services: Interoperable and Network-Aware Big Data Exec...Software-Defined Data Services: Interoperable and Network-Aware Big Data Exec...
Software-Defined Data Services: Interoperable and Network-Aware Big Data Exec...
Pradeeban Kathiravelu, Ph.D.
 
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Pradeeban Kathiravelu, Ph.D.
 
Software-Defined Inter-Cloud Composition of Big Services
Software-Defined Inter-Cloud Composition of Big ServicesSoftware-Defined Inter-Cloud Composition of Big Services
Software-Defined Inter-Cloud Composition of Big Services
Pradeeban Kathiravelu, Ph.D.
 
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Pradeeban Kathiravelu, Ph.D.
 
Componentizing Big Services in the Internet
Componentizing Big Services in the InternetComponentizing Big Services in the Internet
Componentizing Big Services in the Internet
Pradeeban Kathiravelu, Ph.D.
 

More from Pradeeban Kathiravelu, Ph.D. (20)

Google Summer of Code_2023.pdf
Google Summer of Code_2023.pdfGoogle Summer of Code_2023.pdf
Google Summer of Code_2023.pdf
 
Google Summer of Code (GSoC) 2022
Google Summer of Code (GSoC) 2022Google Summer of Code (GSoC) 2022
Google Summer of Code (GSoC) 2022
 
Google Summer of Code (GSoC) 2022
Google Summer of Code (GSoC) 2022Google Summer of Code (GSoC) 2022
Google Summer of Code (GSoC) 2022
 
Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.
Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.
Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.
 
Google summer of code (GSoC) 2021
Google summer of code (GSoC) 2021Google summer of code (GSoC) 2021
Google summer of code (GSoC) 2021
 
A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology ...
A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology ...A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology ...
A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology ...
 
Google Summer of Code (GSoC) 2020 for mentors
Google Summer of Code (GSoC) 2020 for mentorsGoogle Summer of Code (GSoC) 2020 for mentors
Google Summer of Code (GSoC) 2020 for mentors
 
Google Summer of Code (GSoC) 2020
Google Summer of Code (GSoC) 2020Google Summer of Code (GSoC) 2020
Google Summer of Code (GSoC) 2020
 
Data Services with Bindaas: RESTful Interfaces for Diverse Data Sources
Data Services with Bindaas: RESTful Interfaces for Diverse Data SourcesData Services with Bindaas: RESTful Interfaces for Diverse Data Sources
Data Services with Bindaas: RESTful Interfaces for Diverse Data Sources
 
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degreeThe UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
 
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Compos...
 My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Compos... My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Compos...
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Compos...
 
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...
 
UCL Ph.D. Confirmation 2018
UCL Ph.D. Confirmation 2018UCL Ph.D. Confirmation 2018
UCL Ph.D. Confirmation 2018
 
Software-Defined Systems for Network-Aware Service Composition and Workflow P...
Software-Defined Systems for Network-Aware Service Composition and Workflow P...Software-Defined Systems for Network-Aware Service Composition and Workflow P...
Software-Defined Systems for Network-Aware Service Composition and Workflow P...
 
Moving bits with a fleet of shared virtual routers
Moving bits with a fleet of shared virtual routersMoving bits with a fleet of shared virtual routers
Moving bits with a fleet of shared virtual routers
 
Software-Defined Data Services: Interoperable and Network-Aware Big Data Exec...
Software-Defined Data Services: Interoperable and Network-Aware Big Data Exec...Software-Defined Data Services: Interoperable and Network-Aware Big Data Exec...
Software-Defined Data Services: Interoperable and Network-Aware Big Data Exec...
 
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
 
Software-Defined Inter-Cloud Composition of Big Services
Software-Defined Inter-Cloud Composition of Big ServicesSoftware-Defined Inter-Cloud Composition of Big Services
Software-Defined Inter-Cloud Composition of Big Services
 
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
 
Componentizing Big Services in the Internet
Componentizing Big Services in the InternetComponentizing Big Services in the Internet
Componentizing Big Services in the Internet
 

Recently uploaded

Cervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptxCervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptx
Dr. Rabia Inam Gandapore
 
How to Give Better Lectures: Some Tips for Doctors
How to Give Better Lectures: Some Tips for DoctorsHow to Give Better Lectures: Some Tips for Doctors
How to Give Better Lectures: Some Tips for Doctors
LanceCatedral
 
Non-respiratory Functions of the Lungs.pdf
Non-respiratory Functions of the Lungs.pdfNon-respiratory Functions of the Lungs.pdf
Non-respiratory Functions of the Lungs.pdf
MedicoseAcademics
 
TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Ve...
TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Ve...TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Ve...
TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Ve...
kevinkariuki227
 
Surat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model Safe
Surat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model SafeSurat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model Safe
Surat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model Safe
Savita Shen $i11
 
The hemodynamic and autonomic determinants of elevated blood pressure in obes...
The hemodynamic and autonomic determinants of elevated blood pressure in obes...The hemodynamic and autonomic determinants of elevated blood pressure in obes...
The hemodynamic and autonomic determinants of elevated blood pressure in obes...
Catherine Liao
 
Prix Galien International 2024 Forum Program
Prix Galien International 2024 Forum ProgramPrix Galien International 2024 Forum Program
Prix Galien International 2024 Forum Program
Levi Shapiro
 
Ocular injury ppt Upendra pal optometrist upums saifai etawah
Ocular injury  ppt  Upendra pal  optometrist upums saifai etawahOcular injury  ppt  Upendra pal  optometrist upums saifai etawah
Ocular injury ppt Upendra pal optometrist upums saifai etawah
pal078100
 
POST OPERATIVE OLIGURIA and its management
POST OPERATIVE OLIGURIA and its managementPOST OPERATIVE OLIGURIA and its management
POST OPERATIVE OLIGURIA and its management
touseefaziz1
 
ARTIFICIAL INTELLIGENCE IN HEALTHCARE.pdf
ARTIFICIAL INTELLIGENCE IN  HEALTHCARE.pdfARTIFICIAL INTELLIGENCE IN  HEALTHCARE.pdf
ARTIFICIAL INTELLIGENCE IN HEALTHCARE.pdf
Anujkumaranit
 
The Normal Electrocardiogram - Part I of II
The Normal Electrocardiogram - Part I of IIThe Normal Electrocardiogram - Part I of II
The Normal Electrocardiogram - Part I of II
MedicoseAcademics
 
basicmodesofventilation2022-220313203758.pdf
basicmodesofventilation2022-220313203758.pdfbasicmodesofventilation2022-220313203758.pdf
basicmodesofventilation2022-220313203758.pdf
aljamhori teaching hospital
 
ANATOMY AND PHYSIOLOGY OF URINARY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF URINARY SYSTEM.pptxANATOMY AND PHYSIOLOGY OF URINARY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF URINARY SYSTEM.pptx
Swetaba Besh
 
New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...
New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...
New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...
i3 Health
 
Are There Any Natural Remedies To Treat Syphilis.pdf
Are There Any Natural Remedies To Treat Syphilis.pdfAre There Any Natural Remedies To Treat Syphilis.pdf
Are There Any Natural Remedies To Treat Syphilis.pdf
Little Cross Family Clinic
 
Factory Supply Best Quality Pmk Oil CAS 28578–16–7 PMK Powder in Stock
Factory Supply Best Quality Pmk Oil CAS 28578–16–7 PMK Powder in StockFactory Supply Best Quality Pmk Oil CAS 28578–16–7 PMK Powder in Stock
Factory Supply Best Quality Pmk Oil CAS 28578–16–7 PMK Powder in Stock
rebeccabio
 
Physiology of Special Chemical Sensation of Taste
Physiology of Special Chemical Sensation of TastePhysiology of Special Chemical Sensation of Taste
Physiology of Special Chemical Sensation of Taste
MedicoseAcademics
 
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists  Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Saeid Safari
 
Surgical Site Infections, pathophysiology, and prevention.pptx
Surgical Site Infections, pathophysiology, and prevention.pptxSurgical Site Infections, pathophysiology, and prevention.pptx
Surgical Site Infections, pathophysiology, and prevention.pptx
jval Landero
 
Antiulcer drugs Advance Pharmacology .pptx
Antiulcer drugs Advance Pharmacology .pptxAntiulcer drugs Advance Pharmacology .pptx
Antiulcer drugs Advance Pharmacology .pptx
Rohit chaurpagar
 

Recently uploaded (20)

Cervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptxCervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptx
 
How to Give Better Lectures: Some Tips for Doctors
How to Give Better Lectures: Some Tips for DoctorsHow to Give Better Lectures: Some Tips for Doctors
How to Give Better Lectures: Some Tips for Doctors
 
Non-respiratory Functions of the Lungs.pdf
Non-respiratory Functions of the Lungs.pdfNon-respiratory Functions of the Lungs.pdf
Non-respiratory Functions of the Lungs.pdf
 
TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Ve...
TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Ve...TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Ve...
TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Ve...
 
Surat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model Safe
Surat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model SafeSurat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model Safe
Surat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model Safe
 
The hemodynamic and autonomic determinants of elevated blood pressure in obes...
The hemodynamic and autonomic determinants of elevated blood pressure in obes...The hemodynamic and autonomic determinants of elevated blood pressure in obes...
The hemodynamic and autonomic determinants of elevated blood pressure in obes...
 
Prix Galien International 2024 Forum Program
Prix Galien International 2024 Forum ProgramPrix Galien International 2024 Forum Program
Prix Galien International 2024 Forum Program
 
Ocular injury ppt Upendra pal optometrist upums saifai etawah
Ocular injury  ppt  Upendra pal  optometrist upums saifai etawahOcular injury  ppt  Upendra pal  optometrist upums saifai etawah
Ocular injury ppt Upendra pal optometrist upums saifai etawah
 
POST OPERATIVE OLIGURIA and its management
POST OPERATIVE OLIGURIA and its managementPOST OPERATIVE OLIGURIA and its management
POST OPERATIVE OLIGURIA and its management
 
ARTIFICIAL INTELLIGENCE IN HEALTHCARE.pdf
ARTIFICIAL INTELLIGENCE IN  HEALTHCARE.pdfARTIFICIAL INTELLIGENCE IN  HEALTHCARE.pdf
ARTIFICIAL INTELLIGENCE IN HEALTHCARE.pdf
 
The Normal Electrocardiogram - Part I of II
The Normal Electrocardiogram - Part I of IIThe Normal Electrocardiogram - Part I of II
The Normal Electrocardiogram - Part I of II
 
basicmodesofventilation2022-220313203758.pdf
basicmodesofventilation2022-220313203758.pdfbasicmodesofventilation2022-220313203758.pdf
basicmodesofventilation2022-220313203758.pdf
 
ANATOMY AND PHYSIOLOGY OF URINARY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF URINARY SYSTEM.pptxANATOMY AND PHYSIOLOGY OF URINARY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF URINARY SYSTEM.pptx
 
New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...
New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...
New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...
 
Are There Any Natural Remedies To Treat Syphilis.pdf
Are There Any Natural Remedies To Treat Syphilis.pdfAre There Any Natural Remedies To Treat Syphilis.pdf
Are There Any Natural Remedies To Treat Syphilis.pdf
 
Factory Supply Best Quality Pmk Oil CAS 28578–16–7 PMK Powder in Stock
Factory Supply Best Quality Pmk Oil CAS 28578–16–7 PMK Powder in StockFactory Supply Best Quality Pmk Oil CAS 28578–16–7 PMK Powder in Stock
Factory Supply Best Quality Pmk Oil CAS 28578–16–7 PMK Powder in Stock
 
Physiology of Special Chemical Sensation of Taste
Physiology of Special Chemical Sensation of TastePhysiology of Special Chemical Sensation of Taste
Physiology of Special Chemical Sensation of Taste
 
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists  Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
 
Surgical Site Infections, pathophysiology, and prevention.pptx
Surgical Site Infections, pathophysiology, and prevention.pptxSurgical Site Infections, pathophysiology, and prevention.pptx
Surgical Site Infections, pathophysiology, and prevention.pptx
 
Antiulcer drugs Advance Pharmacology .pptx
Antiulcer drugs Advance Pharmacology .pptxAntiulcer drugs Advance Pharmacology .pptx
Antiulcer drugs Advance Pharmacology .pptx
 

On-Demand Service-Based Big Data Integration: Optimized for Research Collaboration

  • 1. 1/23 Pradeeban Kathiravelu1,2 , Yiru Chen3 , Ashish Sharma4 , Helena Galhardas1 , Peter Van Roy2 , Luís Veiga1 On-Demand Service-Based Big Data Integration: Optimized for Research Collaboration The 3rd International Workshop on Data Management and Analytics for Medicine and Healthcare (DMAH), in conjunction with the 43rd International Conference on Very Large Data Bases. Munich, Germany. September 1, 2017. 1 INESC-ID / Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal 2 Université catholique de Louvain, Louvain-la-Neuve, Belgium 3 Peking University, Beijing, China 4 Department of Biomedical Informatics, Emory University, Atlanta, USA
  • 2. 2/23 Introduction ● Scale and diversity of big data are rising. – Geographically distributed data of exabytes. – Structured, semi-structured, unstructured, or ill-formed data. ● Integration of data is crucial for data science. ● Sharing of integrated data and results. – Mandatory for reproducible research.
  • 3. 3/23 Challenges in Medical Research for Big Data Integration ● Multiple types of data. – Imaging, clinical, and genomic. ● Numerous data sources. – No shared messaging protocol. ● Do we really need to integrate all the data?
  • 4. 4/23 A Story of Medical Data Researchers...A Story of Medical Data Researchers...
  • 5. 5/23 ● Jim is interested in the effects of a medicine to treat brain tumor in patients of certain age groups.
  • 6. 6/23 Observation - 1 ● Various sources. – Service-based data access through APIs. ● Thanks to specifications such as HL7 FHIR. ● The researchers possess domain knowledge. ● Integrate On-Demand. – Avoid eager loading of binary data or its textual metadata. – Use the researcher query as an input in loading data. ● Scalable storage in-house. – Potential to load, integrate, index, and query unstructured data.
  • 8. 8/23 Observation - 2 ● Load data only once per organization. – Bandwidth and storage efficiency.
  • 9. 9/23 ● Sharing the research data with researchers, beyond organization boundaries.
  • 10. 10/23 Observation - 3 ● Do not duplicate data! – We ``own`` our interest; not the data. ● Point to the data in the data sources. – Pointers to data like Dropbox Shared Links work well. ● Avoids outdated duplicate data. ● Easy to maintain. ● APIs – Access the list of research data sets.
  • 11. 11/23 Problems ● How to.. – Load data from several service-based big data sources. ● Avoid duplicate downloads and near duplicate data. – Integrate disparate data and persist for future accesses. – Share pointers to data internally and externally.
  • 12. 12/23 Óbidos OOn-demand BBig Data IIntegration, DDistribution, and OOrchestration SSystem ● Researcher query → Narrow down the search space. ● Define subsets of data that are of interest. – Exploiting the well-defined hierarchical structure of medical data. ● Medical Images (DICOM) ● Clinical data ● ..
  • 13. 13/23 Óbidos Approach ● Hybrid of virtual and materialized data integration approaches. – Lazy load of metadata: Load the matching subset of metadata. – Store integrated data and query results → scalable storage. ● Track already loaded data. – Near duplicate detection. – Download only updates (changesets). ● Efficient SQL queries on NoSQL storage. ● Share pointers to the datasets rather than the dataset itself. ● Generic design; implementation for medical research data.Generic design; implementation for medical research data.
  • 15. 15/23 Evaluation ● Evaluation Data: – Clinical data and DICOM imaging collections of TCIA. ● Benchmark Óbidos against eager and lazy ETL. – Performance of loading and querying data. ● Óbidos (inter- and intra- organization) against binary data sharing. – Space/bandwidth efficiency of data sharing.
  • 17. 17/23 Data load time Change in total data volume (Same query and same interest) ● Observation: – Load time increases for eager and lazy ETL with total volume. – Load time for Óbidos remains constant. ● Total volume of data is irrelevant for Óbidos.
  • 18. 18/23 Change in studies of interest (Same query and constant total data volume) Data load time ● Observation: – Load time for eager and lazy ETL remains constant. – Load time increases for Óbidos with the interest. ● Converges to the load time of lazy ETL.
  • 19. 19/23 Query completion time for the integrated data repository ● Observation: – We assume the corresponding data is already loaded. ● Thus, lazy and eager ETL perform similar. – Indexed scalable NoSQL architecture of Óbidos → Better performance.
  • 20. 20/23 Efficiency in Sharing Medical Research Data ● Observation: – A constant-size UID is sufficient, intra-organization. – With number of series, Óbidos pointers grow, inter-organization. – Traditional binary data sharing: shared data size = volume of the image series.
  • 21. 21/23 Conclusion ● Óbidos offers on-demand service-based big data integration. – Fast and resource-efficient data analysis. – SQL queries over NoSQL data store for the integrated data. – Efficient data sharing without duplicating actual data. ● Future Work – Consume data from repositories of domains beyond medical data. ● EUDAT – Óbidos distributed virtual data warehouses. ● Leverage the proximity of the organizations in data integration and sharing.
  • 22. 22/23 Acknowledgements ● NCI QIN grant (1U01CA187013, Resources for Development and Validation Of Radiomic Analyses and Adaptive Therapy). ● Google Summer of Code (2014, 2015, and 2016). ● The Cancer Imaging Archive (TCIA). ● Tyk and API Umbrella Teams.
  • 23. 23/23 Conclusion ● Óbidos offers on-demand service-based big data integration. – Fast and resource-efficient data analysis. – SQL queries over NoSQL data store for the integrated data. – Efficient data sharing without duplicating actual data. ● Future Work – Consume data from repositories of domains beyond medical data. ● EUDAT – Óbidos distributed virtual data warehouses. ● Leverage the proximity of the organizations in data integration and sharing. Thank you! Questions?