SlideShare a Scribd company logo
1 of 15
Download to read offline
Managing Biomedical
Data and Metadata in
Large Scale Collaborations
November 28, 2018
Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd.
▪ What is Metadata?
− Content
− Context
− Process
▪ Metadata not always derived from the artifact
directly, but obtained from multiple sources
▪ Metadata semantics are key to unlocking
findability, provenance and usability of data
artefacts
Page ▪ 2
Why Metadata?
Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd.
▪ Data continues to be accumulated at exponential rate
− There are multiple efforts capturing anything conceivable
− Study data vs non study data lines are blurring
▪ Data demands continues to grow
− Everyone hungers for high quality consented biomedical datasets
− Regulation like GDPR points to large scale consent management capability
▪ Generating and storing all data inhouse is no longer making business sense
Page ▪ 3
Why Collaboration?
Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd.
▪ Data is produced in silos
− Specialized systems: clinical, prescriptions, lab,
imaging, sequencing, sensors, etc.
▪ Not one warehouse of everything for
everyone
− For the foreseeable future there will always be
some (largish) degree of federation
− No single data science platform can cater to
everyone
▪ Not one view on the data
− No use case needs all the data
− Each use case needs unique combination of data
Page ▪ 4
Status Quo
Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd.
▪ Working with data
− Data Access
o Non-local data
o Data islands
o Multi-disciplinary
− Data Preparation
o Data normalization
o Data scientist grunt work challenge
▪ Working together – sharing vs collaborating
− Different organizations involvement
− Differing methods of processing
▪ Regulation, contracts and audit
Page ▪ 5
Obstacles to Collaboration
Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd.
▪ Aggregation: Central data warehouse with corresponding API layer for querying
very large data sets quickly
▪ Common Challenges
− Data vs Meta-data is blurred
− Scalability
− Cost
− Access controls
Page ▪ 6
The Common Approaches: Aggregation
Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd.
▪ Standardization: Common Data Models and APIs to obtain
information from different custodians
▪ Common Challenges
− Many standards
− They are all in flux
− Big effort to implement and to maintain
− Coverage
Page ▪ 7
The Common Approaches: Standardization
Analytics CoverageStandards Coverage
Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd.
▪ Federation: Based on aggregation and standardization query multiple data
custodians and deliver aggregate answers
▪ Common Challenges
− Standardizing queries
− Authentication / Authorization
− Normalization
− Performance
Page ▪ 8
The Common Approaches: Federation
Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd.Page ▪ 9
Metadata and Conway’s Law
“Organizations which design systems (in
the broad sense) ... are constrained to
produce designs which are copies of the
communication structures of these
organizations."
Conway’s Law
Melvin Conway
Datamation, 1968
Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd.
▪ One person's metadata is another person's data
▪ Collaborate and establish broadest consensus for a given data
type
− Minimum viable standard metadata model across custodians
− Further enriched with contextual data specialized per study
− Requirements:
o Handling presence of unexpected as well as absence of expected data
o Propagation of change and impact on provenance
▪ Data model needs to be accomodating - ideally standardized
summary data with ad hoc extensions by interest
Page ▪ 10
Metadata – Description of Data Artefacts
Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd.Page ▪ 11
Metadata Aggregation Lifecycle
Extract Translate Validate Annotate Store Index Project
Any
combination
of tools to
extract data
from one or
many sources:
• File Systems
• Files
• Databases
• APIs
Prepare
extracted
native data
fields for
processing by
DBE
Validate
Metadata
inputs against
type
constraints
Process data
fields marked
for annotation
with ontology
providers
Store
validated and
annotated
data in DBE
database
Index stored
data in DBE
search index
Projection of
outputs
directly into
analysis
frameworks
or via API
Importers DBE Core PlatformData
Sources
Data
Consumers
Distributed Centralized
Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd.Page ▪ 12
Metadata Federation Lifecycle
Portal API
Authentication
Query Builder
Query Federator
Data Basket
HL7 FHIR API
Workspaces
Cohort Management
Importers DBE Core Platform
Extract Translate Validate Annotate Store Index Project
Federation Backends
GA4GH Beacon API
Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd.Page ▪ 13
Data as a function of other data
“Rien ne se perd, rien ne se
crée, tout se transforme”
Antoine-Laurent de Lavoisier
▪ Metadata not only for content of artefact, but also function
that created / transformed the artefact
▪ Every data artefact is the result of one of more functions
− User
− Application Stack, Configuration, Version
− Infrastructure
− Data Dependencies
− Projections
o Inputs or Source
o Outputs (Data)
Essential for provenance, reproducibility and
consent operations
Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd.Page ▪ 14
Do You Have
Any Questions?
Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd.
Databiology Ltd.
Magdalen Centre
The Oxford Science Park
Oxford, OX4 4GA
United Kingdom
+44-1865-784426
contactus@databiology.com
twitter.com/databiologylinkedin.com/company/databiologydatabiology.com
Databiology Inc.
201 Spear Street, Suite 1100
San Francisco, CA 94105
USA
+1-415-426-3592
contactus@databiology.com
Contact us or follow us online!
Databiology Hong Kong Ltd.
Unit E, 6/F Golden Sun Centre
59-67 Bonham Street West
Sheung Wan, Hong Kong
Hong Kong (SAR)
+852-8193-4005
contactus@databiology.com

More Related Content

What's hot

Powering Self Service Business Intelligence with Hadoop and Data Virtualization
Powering Self Service Business Intelligence with Hadoop and Data VirtualizationPowering Self Service Business Intelligence with Hadoop and Data Virtualization
Powering Self Service Business Intelligence with Hadoop and Data VirtualizationDenodo
 
Building trust in your data lake. A fintech case study on automated data disc...
Building trust in your data lake. A fintech case study on automated data disc...Building trust in your data lake. A fintech case study on automated data disc...
Building trust in your data lake. A fintech case study on automated data disc...DataWorks Summit
 
Practical experiences using Atlas and Ranger to implement GDPR
Practical experiences using Atlas and Ranger to implement GDPRPractical experiences using Atlas and Ranger to implement GDPR
Practical experiences using Atlas and Ranger to implement GDPRDataWorks Summit
 
Decoding the Acronyms in Clinical Data Standards
Decoding the Acronyms in Clinical Data StandardsDecoding the Acronyms in Clinical Data Standards
Decoding the Acronyms in Clinical Data Standardsd-Wise Technologies
 
MongoDB at Agilysys: A Case Study
MongoDB at Agilysys: A Case StudyMongoDB at Agilysys: A Case Study
MongoDB at Agilysys: A Case StudyMongoDB
 
Azure data catalog your data your way eugene polonichko dataconf 21 04 18
Azure data catalog your data your way eugene polonichko dataconf 21 04 18Azure data catalog your data your way eugene polonichko dataconf 21 04 18
Azure data catalog your data your way eugene polonichko dataconf 21 04 18Olga Zinkevych
 
Enterprise Reporting with MongoDB and JasperSoft
Enterprise Reporting with MongoDB and JasperSoftEnterprise Reporting with MongoDB and JasperSoft
Enterprise Reporting with MongoDB and JasperSoftMongoDB
 
Cortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data CatalogCortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data CatalogMSAdvAnalytics
 
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data VirtualizationDenodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data VirtualizationDenodo
 
Best Practices: Data Virtualization Perspectives and Best Practices
Best Practices: Data Virtualization Perspectives and Best PracticesBest Practices: Data Virtualization Perspectives and Best Practices
Best Practices: Data Virtualization Perspectives and Best PracticesDenodo
 
Denodo Platform 7.0: What's New?
Denodo Platform 7.0: What's New?Denodo Platform 7.0: What's New?
Denodo Platform 7.0: What's New?Denodo
 
Education Seminar: Self-service BI, Logical Data Warehouse and Data Lakes
Education Seminar: Self-service BI, Logical Data Warehouse and Data LakesEducation Seminar: Self-service BI, Logical Data Warehouse and Data Lakes
Education Seminar: Self-service BI, Logical Data Warehouse and Data LakesDenodo
 
Logical Data Fabric: An Introduction
Logical Data Fabric: An IntroductionLogical Data Fabric: An Introduction
Logical Data Fabric: An IntroductionDenodo
 
VMworld 2013: VMware Hybrid Cloud – An Introduction to Object Store
VMworld 2013: VMware Hybrid Cloud – An Introduction to Object Store VMworld 2013: VMware Hybrid Cloud – An Introduction to Object Store
VMworld 2013: VMware Hybrid Cloud – An Introduction to Object Store VMworld
 
COnSeNT 2021 - ODRL Profile for Expressing Consent through Granular Access Co...
COnSeNT 2021 - ODRL Profile for Expressing Consent through Granular Access Co...COnSeNT 2021 - ODRL Profile for Expressing Consent through Granular Access Co...
COnSeNT 2021 - ODRL Profile for Expressing Consent through Granular Access Co...Beatriz Esteves
 
GraphTalk Copenhagen - Killing Data Silos in the Life Sciences with Neo4j
GraphTalk Copenhagen - Killing Data Silos in the Life Sciences with Neo4jGraphTalk Copenhagen - Killing Data Silos in the Life Sciences with Neo4j
GraphTalk Copenhagen - Killing Data Silos in the Life Sciences with Neo4jNeo4j
 
Denodo DataFest 2016: The Role of Data Virtualization in IoT Integration
Denodo DataFest 2016: The Role of Data Virtualization in IoT IntegrationDenodo DataFest 2016: The Role of Data Virtualization in IoT Integration
Denodo DataFest 2016: The Role of Data Virtualization in IoT IntegrationDenodo
 
GDPRov: provenance for GDPR
GDPRov: provenance for GDPR GDPRov: provenance for GDPR
GDPRov: provenance for GDPR vty
 
Scaling Up Data Access and Storage Without Scaling Up Costs
Scaling Up Data Access and Storage Without Scaling Up CostsScaling Up Data Access and Storage Without Scaling Up Costs
Scaling Up Data Access and Storage Without Scaling Up CostsVarun Mittal
 
II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to ...
II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to ...II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to ...
II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to ...Dr. Haxel Consult
 

What's hot (20)

Powering Self Service Business Intelligence with Hadoop and Data Virtualization
Powering Self Service Business Intelligence with Hadoop and Data VirtualizationPowering Self Service Business Intelligence with Hadoop and Data Virtualization
Powering Self Service Business Intelligence with Hadoop and Data Virtualization
 
Building trust in your data lake. A fintech case study on automated data disc...
Building trust in your data lake. A fintech case study on automated data disc...Building trust in your data lake. A fintech case study on automated data disc...
Building trust in your data lake. A fintech case study on automated data disc...
 
Practical experiences using Atlas and Ranger to implement GDPR
Practical experiences using Atlas and Ranger to implement GDPRPractical experiences using Atlas and Ranger to implement GDPR
Practical experiences using Atlas and Ranger to implement GDPR
 
Decoding the Acronyms in Clinical Data Standards
Decoding the Acronyms in Clinical Data StandardsDecoding the Acronyms in Clinical Data Standards
Decoding the Acronyms in Clinical Data Standards
 
MongoDB at Agilysys: A Case Study
MongoDB at Agilysys: A Case StudyMongoDB at Agilysys: A Case Study
MongoDB at Agilysys: A Case Study
 
Azure data catalog your data your way eugene polonichko dataconf 21 04 18
Azure data catalog your data your way eugene polonichko dataconf 21 04 18Azure data catalog your data your way eugene polonichko dataconf 21 04 18
Azure data catalog your data your way eugene polonichko dataconf 21 04 18
 
Enterprise Reporting with MongoDB and JasperSoft
Enterprise Reporting with MongoDB and JasperSoftEnterprise Reporting with MongoDB and JasperSoft
Enterprise Reporting with MongoDB and JasperSoft
 
Cortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data CatalogCortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data Catalog
 
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data VirtualizationDenodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
 
Best Practices: Data Virtualization Perspectives and Best Practices
Best Practices: Data Virtualization Perspectives and Best PracticesBest Practices: Data Virtualization Perspectives and Best Practices
Best Practices: Data Virtualization Perspectives and Best Practices
 
Denodo Platform 7.0: What's New?
Denodo Platform 7.0: What's New?Denodo Platform 7.0: What's New?
Denodo Platform 7.0: What's New?
 
Education Seminar: Self-service BI, Logical Data Warehouse and Data Lakes
Education Seminar: Self-service BI, Logical Data Warehouse and Data LakesEducation Seminar: Self-service BI, Logical Data Warehouse and Data Lakes
Education Seminar: Self-service BI, Logical Data Warehouse and Data Lakes
 
Logical Data Fabric: An Introduction
Logical Data Fabric: An IntroductionLogical Data Fabric: An Introduction
Logical Data Fabric: An Introduction
 
VMworld 2013: VMware Hybrid Cloud – An Introduction to Object Store
VMworld 2013: VMware Hybrid Cloud – An Introduction to Object Store VMworld 2013: VMware Hybrid Cloud – An Introduction to Object Store
VMworld 2013: VMware Hybrid Cloud – An Introduction to Object Store
 
COnSeNT 2021 - ODRL Profile for Expressing Consent through Granular Access Co...
COnSeNT 2021 - ODRL Profile for Expressing Consent through Granular Access Co...COnSeNT 2021 - ODRL Profile for Expressing Consent through Granular Access Co...
COnSeNT 2021 - ODRL Profile for Expressing Consent through Granular Access Co...
 
GraphTalk Copenhagen - Killing Data Silos in the Life Sciences with Neo4j
GraphTalk Copenhagen - Killing Data Silos in the Life Sciences with Neo4jGraphTalk Copenhagen - Killing Data Silos in the Life Sciences with Neo4j
GraphTalk Copenhagen - Killing Data Silos in the Life Sciences with Neo4j
 
Denodo DataFest 2016: The Role of Data Virtualization in IoT Integration
Denodo DataFest 2016: The Role of Data Virtualization in IoT IntegrationDenodo DataFest 2016: The Role of Data Virtualization in IoT Integration
Denodo DataFest 2016: The Role of Data Virtualization in IoT Integration
 
GDPRov: provenance for GDPR
GDPRov: provenance for GDPR GDPRov: provenance for GDPR
GDPRov: provenance for GDPR
 
Scaling Up Data Access and Storage Without Scaling Up Costs
Scaling Up Data Access and Storage Without Scaling Up CostsScaling Up Data Access and Storage Without Scaling Up Costs
Scaling Up Data Access and Storage Without Scaling Up Costs
 
II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to ...
II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to ...II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to ...
II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to ...
 

Similar to Managing Biomedical Data and Metadata in Large Scale Collaborations

Hadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata CompanyHadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata CompanyDataWorks Summit
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderDataconomy Media
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationDatabricks
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesDataWorks Summit
 
GDPR Noncompliance: Avoid the Risk with Data Virtualization
GDPR Noncompliance: Avoid the Risk with Data VirtualizationGDPR Noncompliance: Avoid the Risk with Data Virtualization
GDPR Noncompliance: Avoid the Risk with Data VirtualizationDenodo
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?Denodo
 
Acquisition de données dans Neo4j pour le Master Data Management
Acquisition de données dans Neo4j pour le Master Data ManagementAcquisition de données dans Neo4j pour le Master Data Management
Acquisition de données dans Neo4j pour le Master Data ManagementNeo4j
 
Modern data integration expert sessions
Modern data integration expert sessionsModern data integration expert sessions
Modern data integration expert sessionsJessicaMurrell3
 
Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar ibi
 
Managing Data Warehouse Growth in the New Era of Big Data
Managing Data Warehouse Growth in the New Era of Big DataManaging Data Warehouse Growth in the New Era of Big Data
Managing Data Warehouse Growth in the New Era of Big DataVineet
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIDenodo
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesDataWorks Summit
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Denodo
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Introduction to Harnessing Big Data
Introduction to Harnessing Big DataIntroduction to Harnessing Big Data
Introduction to Harnessing Big DataPaul Barsch
 
The CIO guide to Big Data Archiving
The CIO guide to Big Data ArchivingThe CIO guide to Big Data Archiving
The CIO guide to Big Data ArchivingLindaWatson19
 
Strata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationStrata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationZaloni
 
Tarmin GridBank Overview
Tarmin GridBank OverviewTarmin GridBank Overview
Tarmin GridBank OverviewTarminInc
 
Necessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services SectorNecessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services SectorDataWorks Summit
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)Denodo
 

Similar to Managing Biomedical Data and Metadata in Large Scale Collaborations (20)

Hadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata CompanyHadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata Company
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern Staender
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with Alation
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
GDPR Noncompliance: Avoid the Risk with Data Virtualization
GDPR Noncompliance: Avoid the Risk with Data VirtualizationGDPR Noncompliance: Avoid the Risk with Data Virtualization
GDPR Noncompliance: Avoid the Risk with Data Virtualization
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
 
Acquisition de données dans Neo4j pour le Master Data Management
Acquisition de données dans Neo4j pour le Master Data ManagementAcquisition de données dans Neo4j pour le Master Data Management
Acquisition de données dans Neo4j pour le Master Data Management
 
Modern data integration expert sessions
Modern data integration expert sessionsModern data integration expert sessions
Modern data integration expert sessions
 
Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar
 
Managing Data Warehouse Growth in the New Era of Big Data
Managing Data Warehouse Growth in the New Era of Big DataManaging Data Warehouse Growth in the New Era of Big Data
Managing Data Warehouse Growth in the New Era of Big Data
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Introduction to Harnessing Big Data
Introduction to Harnessing Big DataIntroduction to Harnessing Big Data
Introduction to Harnessing Big Data
 
The CIO guide to Big Data Archiving
The CIO guide to Big Data ArchivingThe CIO guide to Big Data Archiving
The CIO guide to Big Data Archiving
 
Strata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationStrata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma Presentation
 
Tarmin GridBank Overview
Tarmin GridBank OverviewTarmin GridBank Overview
Tarmin GridBank Overview
 
Necessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services SectorNecessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services Sector
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 

Recently uploaded

Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Heredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsHeredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsCharlene Llagas
 
Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayZachary Labe
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555kikilily0909
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Temporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of MasticationTemporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of Masticationvidulajaib
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxEran Akiva Sinbar
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 

Recently uploaded (20)

Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Heredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsHeredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of Traits
 
Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work Day
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Temporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of MasticationTemporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of Mastication
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 

Managing Biomedical Data and Metadata in Large Scale Collaborations

  • 1. Managing Biomedical Data and Metadata in Large Scale Collaborations November 28, 2018
  • 2. Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd. ▪ What is Metadata? − Content − Context − Process ▪ Metadata not always derived from the artifact directly, but obtained from multiple sources ▪ Metadata semantics are key to unlocking findability, provenance and usability of data artefacts Page ▪ 2 Why Metadata?
  • 3. Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd. ▪ Data continues to be accumulated at exponential rate − There are multiple efforts capturing anything conceivable − Study data vs non study data lines are blurring ▪ Data demands continues to grow − Everyone hungers for high quality consented biomedical datasets − Regulation like GDPR points to large scale consent management capability ▪ Generating and storing all data inhouse is no longer making business sense Page ▪ 3 Why Collaboration?
  • 4. Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd. ▪ Data is produced in silos − Specialized systems: clinical, prescriptions, lab, imaging, sequencing, sensors, etc. ▪ Not one warehouse of everything for everyone − For the foreseeable future there will always be some (largish) degree of federation − No single data science platform can cater to everyone ▪ Not one view on the data − No use case needs all the data − Each use case needs unique combination of data Page ▪ 4 Status Quo
  • 5. Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd. ▪ Working with data − Data Access o Non-local data o Data islands o Multi-disciplinary − Data Preparation o Data normalization o Data scientist grunt work challenge ▪ Working together – sharing vs collaborating − Different organizations involvement − Differing methods of processing ▪ Regulation, contracts and audit Page ▪ 5 Obstacles to Collaboration
  • 6. Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd. ▪ Aggregation: Central data warehouse with corresponding API layer for querying very large data sets quickly ▪ Common Challenges − Data vs Meta-data is blurred − Scalability − Cost − Access controls Page ▪ 6 The Common Approaches: Aggregation
  • 7. Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd. ▪ Standardization: Common Data Models and APIs to obtain information from different custodians ▪ Common Challenges − Many standards − They are all in flux − Big effort to implement and to maintain − Coverage Page ▪ 7 The Common Approaches: Standardization Analytics CoverageStandards Coverage
  • 8. Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd. ▪ Federation: Based on aggregation and standardization query multiple data custodians and deliver aggregate answers ▪ Common Challenges − Standardizing queries − Authentication / Authorization − Normalization − Performance Page ▪ 8 The Common Approaches: Federation
  • 9. Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd.Page ▪ 9 Metadata and Conway’s Law “Organizations which design systems (in the broad sense) ... are constrained to produce designs which are copies of the communication structures of these organizations." Conway’s Law Melvin Conway Datamation, 1968
  • 10. Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd. ▪ One person's metadata is another person's data ▪ Collaborate and establish broadest consensus for a given data type − Minimum viable standard metadata model across custodians − Further enriched with contextual data specialized per study − Requirements: o Handling presence of unexpected as well as absence of expected data o Propagation of change and impact on provenance ▪ Data model needs to be accomodating - ideally standardized summary data with ad hoc extensions by interest Page ▪ 10 Metadata – Description of Data Artefacts
  • 11. Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd.Page ▪ 11 Metadata Aggregation Lifecycle Extract Translate Validate Annotate Store Index Project Any combination of tools to extract data from one or many sources: • File Systems • Files • Databases • APIs Prepare extracted native data fields for processing by DBE Validate Metadata inputs against type constraints Process data fields marked for annotation with ontology providers Store validated and annotated data in DBE database Index stored data in DBE search index Projection of outputs directly into analysis frameworks or via API Importers DBE Core PlatformData Sources Data Consumers Distributed Centralized
  • 12. Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd.Page ▪ 12 Metadata Federation Lifecycle Portal API Authentication Query Builder Query Federator Data Basket HL7 FHIR API Workspaces Cohort Management Importers DBE Core Platform Extract Translate Validate Annotate Store Index Project Federation Backends GA4GH Beacon API
  • 13. Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd.Page ▪ 13 Data as a function of other data “Rien ne se perd, rien ne se crée, tout se transforme” Antoine-Laurent de Lavoisier ▪ Metadata not only for content of artefact, but also function that created / transformed the artefact ▪ Every data artefact is the result of one of more functions − User − Application Stack, Configuration, Version − Infrastructure − Data Dependencies − Projections o Inputs or Source o Outputs (Data) Essential for provenance, reproducibility and consent operations
  • 14. Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd.Page ▪ 14 Do You Have Any Questions?
  • 15. Copyright ©2018. All Rights Reserved. Confidential Databiology Ltd. Databiology Ltd. Magdalen Centre The Oxford Science Park Oxford, OX4 4GA United Kingdom +44-1865-784426 contactus@databiology.com twitter.com/databiologylinkedin.com/company/databiologydatabiology.com Databiology Inc. 201 Spear Street, Suite 1100 San Francisco, CA 94105 USA +1-415-426-3592 contactus@databiology.com Contact us or follow us online! Databiology Hong Kong Ltd. Unit E, 6/F Golden Sun Centre 59-67 Bonham Street West Sheung Wan, Hong Kong Hong Kong (SAR) +852-8193-4005 contactus@databiology.com