SlideShare a Scribd company logo
1 of 66
European Life Sciences Infrastructure for Biological Information
www.elixir-europe.org
Standardisation in BMS
European infrastructures
Managing Big Data Workshop
Setting the standards for analyzing and integrating big data
ELIXIR Hub technical coordinator
July 9-10 2014, Berlin Germany
TOC
• ELIXIR
• Standards
• BioMedBridges workshops update
• Standards
• Data deluge
2
ELIXIR
• European life sciences research
infrastructure for biological
information to facilitate research
• Safeguard data and build
sustainable data services
• Participated by major bioinformatics
service providers and supported by
17 EU member states
• Creating a robust infrastructure for
biological information is a bigger
task than any individual
organisation or nation can take on
alone
3
7 | 62
Figure 2 Together, the biomedical science research infrastructuresaddresssocietal challenges
By establishing interoperability between data and services in the biological,
medical, translational and clinical domains, BioMedBridges links basic
BioMedBridges
Biomedical sciences research infrastructures
stronger through common links
• FP7-funded cluster project
• 21 partners in 9 countries
• Computational ‘data and
service’ bridges between the
BMS RIs
• Interoperability between
data and services in the
biological, medical,
translational and clinical
domains
4
European Life Sciences Infrastructure for Biological Information
www.elixir-europe.org
Rafael C Jimenez
ELIXIR Hub technical coordinator
Standards
18.12.18
6
DB
QI
A AA A
DB
QI
DB
QI
DB
QI
DB
QI
A AA A
A Annotator Database Query InterfaceQI User
Data submission/access
Ideally Reality
Data resources in life science
• Many
• Diverse
• Disperse
NAR online Molecular Biology Database Collection 2014
~1800molecular biology
data resources
Utility of databasesScientificimpact
Too little
information
Many, diverse & disperse
databases and interfaces
Tim Hubbard
Data integration
DB
I
DB
I
DB
I
DB
I
Ideally Compromise
Database InterfaceI User
Combining data residing in different sources
… providing users with a unified view of these data.
DB
I
DB DB DB
DB
I
Reality
Many, diverse & disperse
databases and interfaces
18.12.18
10
Utility of bioinformaticsScientificimpact
Too little
bioinformatics
Integration of
Data integration issues
Many data sources
• Maintain and update
• New appearing
• Many vanishing*
Different query interfaces
data integration?
Variable results
• Syntax
• Semantics
• Minimum information
* Merali Z. et all. Databases in peril. Nature 2005.
Where to find them?
Redundant data?
Standards
• Community agreed specification for how data types
should be represented and described.
• Standards facilitates:
 Interoperability
 Integration
 Exchange
 Portability
 Comparison
 Representation
 Sharing
 Replication
 Consistency
 Verification
 Compliance
 Reusability
 Access
 Submission
 Analysis
 Edition
 Visualization
 Conversion
 Validation
 Annotation
 Search
Heterogeneous integration
Homogeneous integration
Data integration
A B C
1
2
Improving Links Between distributed European
resources
ELIXIR pilot: Interoperability of protein expressions resources
The Human Protein Atlas portal is a publicly available database
with millions of high-resolution images showing the spatial
distribution of proteins in 46 different normal human tissues and
20 different cancer types, as well as 47 different human cell lines.
Standards
15
Schema
Interfaces
Guidelines
Ontologies
Format
Identifiers
Data
Definition Representation Access
• Not just a format …
Molecular interactions
PSI-MI
PSICQUIC
MIMIx/IMEX
PSI-MI CV
XML/TAB
IMEX/Uniprot
Data
Definition Representation Access
Standards in data sharing
http://www.mrc.ac.uk/Utilities/Documentrecord/index.htm?d=MRC002552
Different formats for the same data
18
MI
Data
PSI-XML
PSI-MITAB
BioPax
RDF
Cytoscape
DAS • Comprehensive
• Simple
• Generic
• Domain specific
• Structured
http://biosharing.org
Standards (formats, guidelines, ontologies) and databases
20
Registry - Identifiers
Registry - Minimum information guidelines
Registry - Controlled vocabularies
• Ontology browser: http://www.ebi.ac.uk/ontology-lookup
Ontology Lookup Service
Communities organized per domain
• Produce technical standards intended to address the needs of
a community of users.
develop, coordinate, promulgate, revise, amend, reissue, interpret
23
ELIXIR role
• Support communities developing standards
• Encourage communication among communities
• Links amongst standards
• Promote the adoption of standards
• Help to find the gaps among standards
• Recommend standards best practices in data sharing
24
European Life Sciences Infrastructure for Biological Information
www.elixir-europe.org
Data deluge & standards
BMB workshops update
Knowledge ExchangeWorkshop:WP3 Standards
24 - 25 June 2014.VUMC, Amsterdam,The Netherlands
•Best practice for identifiers
•Development of the BMB standards registry
26
Identifiers Best Practice - purpose
• Recommendations for identifiers best practice
• Designing (format, re-use)
• Managing (creation, versioning, provenance, deprecation etc.)
• Using (resolving, mapping etc.)
• Publish a paper
• Introduction to identifier concepts
• Case Studies illustrating identifier usage in real-world scenarios
• Recommendations on best practice
• Show not tell
• Descriptive not normative
• fornon-experts/newcomers
• Gap analysis
• list the biological entities and identifiers type used by BMB partners
27
Identifiers Best Practice – topics 1/2
• Identifier formats
• syntax of database IDs, URI patterns
• Identifier management
• creation, versioning, provenance, deprecation
• Identifier resolution
• how to use an ID to get useful information about the entity
• services for this, e.g. Identifiers.org
• what info. should be given ?
28
Identifiers Best Practice – topics 2/2
• Identifier mapping / aggregation
• how to map IDs on entries in one resource to those in another, to
assign equivalence / make useful links
• e.g. IDs for equivalent protein sequences in different organisms
• e.g. probe->gene->pathway->function (GO)
• Identifier cataloguing
• compilations of types of identifiers (e.g. EDAM ontology) or specific
identifiers (e.g. Cell Line Ontology)
• what info. should be given?
• Case Studies
• use of IDs in a particular domain
29
Standards format registry - purpose
• Discovery, Agreement, Benchmarking, …
• Facilitate syntactic operability across research infrastructure
so samples and data can be integrated and analysed across
ESFRI BMS domains.
30
Standards format registry - topics
• Catalogue of standard
• Interoperability among registry
• identifiers.org, biosahring, EDAM, ELIXIR service registry
• Adaptat to user needs
• Community of users vs. community of producers
• Microstandards
• Standards mapping
• Access to exert knowledge
• Assess fit for purpose
• Rating/metrics
31
BioMedBridges Knowledge ExchangeWorkshop
Tuesday 24 - Wednesday 25 June 2014.VUMC, Amsterdam,The Netherlands
Workshop organised byWP3 - Standards Description and
Harmonisation, to bring together BMB partners, biomedical
standards experts and representatives of external projects.
Best practice for identifiers
•Gap analysis of current identifiers
Development of the BMB standards registry
•Gap analysis for usage of the registry
•Integration of the registry with other tools
32
Identifiers Best Practice - purpose
• Recommendations for identifiers best practice
• Designing (format, re-use)
• Managing (creation, versioning, provenance, deprecation etc.)
• Using (resolving, mapping etc.)
• Publish a paper
• Introduction to identifier concepts
• Case Studies illustrating identifier usage in real-world scenarios
• Recommendations on best practice
• Show not tell
• Descriptive not normative
• fornon-experts/newcomers
• Gap analysis
• list the biological entities and identifiers type used by BMB partners
33
Identifiers Best Practice – topics 1/2
• Identifier formats
• syntax of database IDs, URI patterns
• Identifier management
• creation, versioning, provenance, deprecation
• Identifier resolution
• how to use an ID to get useful information about the entity
• services for this, e.g. Identifiers.org
• what info. should be given ?
34
Identifiers Best Practice – topics 2/2
• Identifier mapping / aggregation
• how to map IDs on entries in one resource to those in another, to
assign equivalence / make useful links
• e.g. IDs for equivalent protein sequences in different organisms
• e.g. probe->gene->pathway->function (GO)
• Identifier cataloguing
• compilations of types of identifiers (e.g. EDAM ontology) or specific
identifiers (e.g. Cell Line Ontology)
• what info. should be given?
• Case Studies
• use of IDs in a particular domain
35
Standards format registry - purpose
• Discovery, Agreement, Benchmarking, …
• Facilitate syntactic operability across research infrastructure
so samples and data can be integrated and analysed across
ESFRI BMS domains.
36
Standards format registry - topics
• Catalogue of standard
• Interoperability among registry
• identifiers.org, biosahring, EDAM, ELIXIR service registry
• Adaptat to user needs
• Community of users vs. community of producers
• Microstandards
• Standards mapping
• Access to exert knowledge
• Assess fit for purpose
• Rating/metrics
37
BioMedBridges workshops
Knowledge ExchangeWorkshop:WP3 Standards
24 -25 June 2014
VUMC, Amsterdam,The Netherlands
38
E-Infrastructure support for the life sciences:
Preparing for the data deluge
15 May 2014
Genome Campus, Hinxton, UK
BioMedBridges workshop
E-Infrastructure support for the life
sciences:
Preparing for the data
deluge
15 May, 2014
Genome Campus, Hinxton, UK
Knowledge exchange workshop
• Discussion of big data challenges in life sciences
• Focus on few representative domains
• Looking 5 years ahead
• Jointly identify potential solutions to our problems
Data
ICT
e-infrastructures
LS
life sciencesPhysical facilities
Scientific information
Transfer
Computation
Storage
How does it affect data sharing
in life sciences?
Large-scale data sharing in the life sciences
http://www.mrc.ac.uk/Utilities/Documentrecord/index.htm?d=MRC002552
How does big data affect data sharing?
http://www.mrc.ac.uk/Utilities/Documentrecord/index.htm?d=MRC002552
Compute Compute
Compute
Storage Compute Transfer
Transfer
Transfer Transfer
Transfer
Storage Storage
Storage
What How Where
Growing data
Guy Cochrane, EMBL-EBI
Cost of DNA sequencing
46
Data generation vs. data transfer
47
~100 GB
~4 TB
~4 TB
24 hours 1 Gb 100 Mb 10 Mb
~30 min
~9 hour
~9 hour
~5 hours
~4 days
~4 days
~2 days
~5 weeks
~5 weeks
DNA sequencing
Mass spectrometry
Microscopy
Network File Transfer
Bottlenecks in Life Sciences?
• Data production grows faster than storage
• Cost of data production technologies declines faster than
storage
• It takes longer to transfer data than produce the data.
Data growth
how to reduce the IT budget shortfall?
http://www.eweek.com/
Data growth
how to reduce the IT budget shortfall?
http://www.eweek.com/
Optimization
Using technology more effectively
Selecting relevant data
Potential solutions
• Storage
• Data compression
• Select what we store
• Evaluate data reproducibility & value of data
• Network
• Faster protocols
• Partitioning
• Network upgrade
• Computation
• Clouds
• Data close to computation
Data compression
• Efficient representation
• Capacity for controlled data
reduction
• Efficient transformations
• Tool chain Precisi
on
Compression
CRAM
Fritz, M.H. Leinonen, R., et al. (2011) Efficient storage of high throughput DNA sequencing data using reference-based
compression. Genome Res. 21 (5), 734-40
Cochrane G., Cook C.E. and Birney E. (2012) The future of DNA sequence archiving. GigaScience 2012, 1:2
http://www.ebi.ac.uk/ena/about/cram_toolkit
Data transfer optimization
• e.g. Getting more from available bandwidth
Guy Cochrane, EMBL-EBI
Data partitioning
• Organisation of data around biological concepts
• Indexing system around these concepts
• Support for requests for partitions along this index
Reference-oriented indexing
Guy Cochrane, EMBL-EBI
What data is relevant?
56
Life sciences diversity
Genomes
Nucleotides
Transcripts
Proteins
Complexes
Pathways
Small molecules
Structures
Domains
Cells
Biobanks
Tissues and
organs
Human
populations
Therapies
Disease
prevention
Early
Diagnosis
Human
individuals
Life sciences diversity
• Different communities
• Some similar requirements
• Not always the same solutions
ProteomicsMetabolomics Clinical data GenomicsImaging
Some conclusions
• Opportunity for e-infrastructures to better understand BMS RI problems.
• Identification of bottlenecks
• Discussion of some potential solutions
• Data growth will change how we do things today
• Different communities -> different models -> some common solutions
• Solutions have to come from use cases
• BMS RI need to be better defining requirements
• We need to use technology more efficiently
• BMS community has to evaluate the practicality of storing everything
• Privacy issues makes big data more challenging
• Difficult to separate big data from computation
• Shortage of expertise of how to deal with scientific data and IT services
European Life Sciences Infrastructure for Biological Information
www.elixir-europe.org
Thank you
Data deposition
Data submission
61
raw data
processed data
metadata
Centralized
database
Data sharing
Data on my disk and available
to anyone who requests it
Submission to data repositories
From
To
Data submissions
63
Data
repository
Journal
submission
Data
repository
Journal
submission reads
Journal request Curator
Data
repository
Data Management Plan
submission
Data management
+
Data sharing
Will big data affect data deposition?
Data on my disk and available
to anyone who requests it
Submission to data repositoriesFrom
To
Data submissions
How much data?
How much available data?
European Life Sciences Infrastructure for Biological Information
www.elixir-europe.org
Thank you

More Related Content

What's hot

FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...Carole Goble
 
Making your data good enough for sharing.
Making your data good enough for sharing.Making your data good enough for sharing.
Making your data good enough for sharing.FAIRDOM
 
How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)Carole Goble
 
FAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsFAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsCarole Goble
 
Building the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsBuilding the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsCarole Goble
 
DisGeNET: A discovery platform for the dynamical exploration of human disease...
DisGeNET: A discovery platform for the dynamical exploration of human disease...DisGeNET: A discovery platform for the dynamical exploration of human disease...
DisGeNET: A discovery platform for the dynamical exploration of human disease...Núria Queralt Rosinach
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Carole Goble
 
Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...FAIRDOM
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsCarole Goble
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Jian Qin
 
FAIR Data and Model Management for Systems Biology (and SOPs too!)
FAIR Data and Model Management for Systems Biology(and SOPs too!)FAIR Data and Model Management for Systems Biology(and SOPs too!)
FAIR Data and Model Management for Systems Biology (and SOPs too!)Carole Goble
 
The FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems BiologyThe FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems BiologyFAIRDOM
 
Report of the second FAIRDOM foundry
Report of the second FAIRDOM foundryReport of the second FAIRDOM foundry
Report of the second FAIRDOM foundryFAIRDOM
 
Cedar OnDemand: An intelligent browser extension to generate ontology-based m...
Cedar OnDemand: An intelligent browser extension to generate ontology-based m...Cedar OnDemand: An intelligent browser extension to generate ontology-based m...
Cedar OnDemand: An intelligent browser extension to generate ontology-based m...Syed Ahmad Chan Bukhari, PhD
 
D paul ecn2013
D paul ecn2013D paul ecn2013
D paul ecn2013ECNOfficer
 
Data cycle microbes
Data cycle microbesData cycle microbes
Data cycle microbesjyotikhadake
 
BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...Alejandra Gonzalez-Beltran
 
Publishing data and code openly
Publishing data and code openlyPublishing data and code openly
Publishing data and code openlyFAIRDOM
 

What's hot (20)

FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
 
Making your data good enough for sharing.
Making your data good enough for sharing.Making your data good enough for sharing.
Making your data good enough for sharing.
 
How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)
 
FAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsFAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research Commons
 
Building the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsBuilding the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of Scientists
 
DisGeNET: A discovery platform for the dynamical exploration of human disease...
DisGeNET: A discovery platform for the dynamical exploration of human disease...DisGeNET: A discovery platform for the dynamical exploration of human disease...
DisGeNET: A discovery platform for the dynamical exploration of human disease...
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!
 
Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...
 
FAIR Data and Model Management for Systems Biology (and SOPs too!)
FAIR Data and Model Management for Systems Biology(and SOPs too!)FAIR Data and Model Management for Systems Biology(and SOPs too!)
FAIR Data and Model Management for Systems Biology (and SOPs too!)
 
The FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems BiologyThe FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems Biology
 
Report of the second FAIRDOM foundry
Report of the second FAIRDOM foundryReport of the second FAIRDOM foundry
Report of the second FAIRDOM foundry
 
Cedar OnDemand: An intelligent browser extension to generate ontology-based m...
Cedar OnDemand: An intelligent browser extension to generate ontology-based m...Cedar OnDemand: An intelligent browser extension to generate ontology-based m...
Cedar OnDemand: An intelligent browser extension to generate ontology-based m...
 
D paul ecn2013
D paul ecn2013D paul ecn2013
D paul ecn2013
 
Data cycle microbes
Data cycle microbesData cycle microbes
Data cycle microbes
 
BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...
 
Publishing data and code openly
Publishing data and code openlyPublishing data and code openly
Publishing data and code openly
 
The VIVO Ontology Project
The VIVO Ontology ProjectThe VIVO Ontology Project
The VIVO Ontology Project
 
STI Summit 2011 - LS4 LS Khaos
STI Summit 2011 - LS4 LS KhaosSTI Summit 2011 - LS4 LS Khaos
STI Summit 2011 - LS4 LS Khaos
 

Similar to Standards Key to Unlocking Biological Data Integration

SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...Carole Goble
 
Elixir at de.nbi meeting
Elixir at de.nbi meetingElixir at de.nbi meeting
Elixir at de.nbi meetingNiklas Blomberg
 
David Van Enckevort - FAIR sample and data access
David Van Enckevort - FAIR sample and data access David Van Enckevort - FAIR sample and data access
David Van Enckevort - FAIR sample and data access DataSciSIG
 
NIH Data Science Special Interest Group
NIH Data Science Special Interest GroupNIH Data Science Special Interest Group
NIH Data Science Special Interest GroupYaffa Rubinstien
 
Fair sample and data access -David Van enckevort
Fair sample and data access -David Van enckevortFair sample and data access -David Van enckevort
Fair sample and data access -David Van enckevortData Science NIH
 
ELIXIR . Technical Coordinator
ELIXIR. Technical CoordinatorELIXIR. Technical Coordinator
ELIXIR . Technical CoordinatorRafael C. Jimenez
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 Scott Edmunds
 
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesApplication of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesPistoia Alliance
 
The Diversity of Biomedical Data, Databases and Standards (Research Data Alli...
The Diversity of Biomedical Data, Databases and Standards (Research Data Alli...The Diversity of Biomedical Data, Databases and Standards (Research Data Alli...
The Diversity of Biomedical Data, Databases and Standards (Research Data Alli...Peter McQuilton
 
PSI-MI standards and PSICQUIC
PSI-MI standards and PSICQUICPSI-MI standards and PSICQUIC
PSI-MI standards and PSICQUICRafael C. Jimenez
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...William Gunn
 
Secure data management, analysis, infrastructure and policy in an internation...
Secure data management, analysis, infrastructure and policy in an internation...Secure data management, analysis, infrastructure and policy in an internation...
Secure data management, analysis, infrastructure and policy in an internation...Carolyn Ten Holter
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
 
FAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and NeuroscienceFAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and NeuroscienceSusanna-Assunta Sansone
 
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)OpenAIRE
 

Similar to Standards Key to Unlocking Biological Data Integration (20)

Standards
StandardsStandards
Standards
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
 
Elixir at de.nbi meeting
Elixir at de.nbi meetingElixir at de.nbi meeting
Elixir at de.nbi meeting
 
ELIXIR FAIR Activities - Examplars
ELIXIR FAIR Activities - ExamplarsELIXIR FAIR Activities - Examplars
ELIXIR FAIR Activities - Examplars
 
David Van Enckevort - FAIR sample and data access
David Van Enckevort - FAIR sample and data access David Van Enckevort - FAIR sample and data access
David Van Enckevort - FAIR sample and data access
 
NIH Data Science Special Interest Group
NIH Data Science Special Interest GroupNIH Data Science Special Interest Group
NIH Data Science Special Interest Group
 
Fair sample and data access -David Van enckevort
Fair sample and data access -David Van enckevortFair sample and data access -David Van enckevort
Fair sample and data access -David Van enckevort
 
Preparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR PrinciplesPreparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR Principles
 
ELIXIR . Technical Coordinator
ELIXIR. Technical CoordinatorELIXIR. Technical Coordinator
ELIXIR . Technical Coordinator
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesApplication of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
 
The Diversity of Biomedical Data, Databases and Standards (Research Data Alli...
The Diversity of Biomedical Data, Databases and Standards (Research Data Alli...The Diversity of Biomedical Data, Databases and Standards (Research Data Alli...
The Diversity of Biomedical Data, Databases and Standards (Research Data Alli...
 
PSI-MI standards and PSICQUIC
PSI-MI standards and PSICQUICPSI-MI standards and PSICQUIC
PSI-MI standards and PSICQUIC
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
 
FAIR: standards and services
FAIR: standards and servicesFAIR: standards and services
FAIR: standards and services
 
Secure data management, analysis, infrastructure and policy in an internation...
Secure data management, analysis, infrastructure and policy in an internation...Secure data management, analysis, infrastructure and policy in an internation...
Secure data management, analysis, infrastructure and policy in an internation...
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
 
FAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and NeuroscienceFAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and Neuroscience
 
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
 

More from Rafael C. Jimenez

BMB Resource Integration Workshop
BMB Resource Integration WorkshopBMB Resource Integration Workshop
BMB Resource Integration Workshop Rafael C. Jimenez
 
Proteomics repositories integration using EUDAT resources
Proteomics repositories integration using EUDAT resourcesProteomics repositories integration using EUDAT resources
Proteomics repositories integration using EUDAT resourcesRafael C. Jimenez
 
Summary of Technical Coordinators discussions
Summary of Technical Coordinators discussionsSummary of Technical Coordinators discussions
Summary of Technical Coordinators discussionsRafael C. Jimenez
 
The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...Rafael C. Jimenez
 
An introduction to programmatic access
An introduction to programmatic accessAn introduction to programmatic access
An introduction to programmatic accessRafael C. Jimenez
 
Life science requirements from e-infrastructure: initial results from a joint...
Life science requirements from e-infrastructure:initial results from a joint...Life science requirements from e-infrastructure:initial results from a joint...
Life science requirements from e-infrastructure: initial results from a joint...Rafael C. Jimenez
 
Technical activities in ELIXIR Europe
Technical activities in ELIXIR EuropeTechnical activities in ELIXIR Europe
Technical activities in ELIXIR EuropeRafael C. Jimenez
 
Challenges of big data. Summary day 1.
Challenges of big data. Summary day 1.Challenges of big data. Summary day 1.
Challenges of big data. Summary day 1.Rafael C. Jimenez
 
Challenges of big data. Aims of the workshop.
Challenges of big data. Aims of the workshop.Challenges of big data. Aims of the workshop.
Challenges of big data. Aims of the workshop.Rafael C. Jimenez
 
Data submissions and archiving raw data in life sciences. A pilot with Proteo...
Data submissions and archiving raw data in life sciences. A pilot with Proteo...Data submissions and archiving raw data in life sciences. A pilot with Proteo...
Data submissions and archiving raw data in life sciences. A pilot with Proteo...Rafael C. Jimenez
 
ELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciencesELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciencesRafael C. Jimenez
 
SASI, A lightweight standard for exchanging course information
SASI, A lightweight standard for exchanging course informationSASI, A lightweight standard for exchanging course information
SASI, A lightweight standard for exchanging course information Rafael C. Jimenez
 
Introduction to the BioJS project
Introduction to the BioJS projectIntroduction to the BioJS project
Introduction to the BioJS projectRafael C. Jimenez
 
Java tutorial: Programmatic Access to Molecular Interactions
Java tutorial: Programmatic Access to Molecular InteractionsJava tutorial: Programmatic Access to Molecular Interactions
Java tutorial: Programmatic Access to Molecular InteractionsRafael C. Jimenez
 

More from Rafael C. Jimenez (20)

BMB Resource Integration Workshop
BMB Resource Integration WorkshopBMB Resource Integration Workshop
BMB Resource Integration Workshop
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
Proteomics repositories integration using EUDAT resources
Proteomics repositories integration using EUDAT resourcesProteomics repositories integration using EUDAT resources
Proteomics repositories integration using EUDAT resources
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
Summary of Technical Coordinators discussions
Summary of Technical Coordinators discussionsSummary of Technical Coordinators discussions
Summary of Technical Coordinators discussions
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
ELIXIR TCG update
ELIXIR TCG updateELIXIR TCG update
ELIXIR TCG update
 
An introduction to programmatic access
An introduction to programmatic accessAn introduction to programmatic access
An introduction to programmatic access
 
Life science requirements from e-infrastructure: initial results from a joint...
Life science requirements from e-infrastructure:initial results from a joint...Life science requirements from e-infrastructure:initial results from a joint...
Life science requirements from e-infrastructure: initial results from a joint...
 
Technical activities in ELIXIR Europe
Technical activities in ELIXIR EuropeTechnical activities in ELIXIR Europe
Technical activities in ELIXIR Europe
 
Challenges of big data. Summary day 1.
Challenges of big data. Summary day 1.Challenges of big data. Summary day 1.
Challenges of big data. Summary day 1.
 
Challenges of big data. Aims of the workshop.
Challenges of big data. Aims of the workshop.Challenges of big data. Aims of the workshop.
Challenges of big data. Aims of the workshop.
 
Data submissions and archiving raw data in life sciences. A pilot with Proteo...
Data submissions and archiving raw data in life sciences. A pilot with Proteo...Data submissions and archiving raw data in life sciences. A pilot with Proteo...
Data submissions and archiving raw data in life sciences. A pilot with Proteo...
 
ELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciencesELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciences
 
SASI, A lightweight standard for exchanging course information
SASI, A lightweight standard for exchanging course informationSASI, A lightweight standard for exchanging course information
SASI, A lightweight standard for exchanging course information
 
Introduction to the BioJS project
Introduction to the BioJS projectIntroduction to the BioJS project
Introduction to the BioJS project
 
BioJS introduction
BioJS introductionBioJS introduction
BioJS introduction
 
Java tutorial: Programmatic Access to Molecular Interactions
Java tutorial: Programmatic Access to Molecular InteractionsJava tutorial: Programmatic Access to Molecular Interactions
Java tutorial: Programmatic Access to Molecular Interactions
 

Recently uploaded

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 

Recently uploaded (20)

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 

Standards Key to Unlocking Biological Data Integration

  • 1. European Life Sciences Infrastructure for Biological Information www.elixir-europe.org Standardisation in BMS European infrastructures Managing Big Data Workshop Setting the standards for analyzing and integrating big data ELIXIR Hub technical coordinator July 9-10 2014, Berlin Germany
  • 2. TOC • ELIXIR • Standards • BioMedBridges workshops update • Standards • Data deluge 2
  • 3. ELIXIR • European life sciences research infrastructure for biological information to facilitate research • Safeguard data and build sustainable data services • Participated by major bioinformatics service providers and supported by 17 EU member states • Creating a robust infrastructure for biological information is a bigger task than any individual organisation or nation can take on alone 3
  • 4. 7 | 62 Figure 2 Together, the biomedical science research infrastructuresaddresssocietal challenges By establishing interoperability between data and services in the biological, medical, translational and clinical domains, BioMedBridges links basic BioMedBridges Biomedical sciences research infrastructures stronger through common links • FP7-funded cluster project • 21 partners in 9 countries • Computational ‘data and service’ bridges between the BMS RIs • Interoperability between data and services in the biological, medical, translational and clinical domains 4
  • 5. European Life Sciences Infrastructure for Biological Information www.elixir-europe.org Rafael C Jimenez ELIXIR Hub technical coordinator Standards
  • 6. 18.12.18 6 DB QI A AA A DB QI DB QI DB QI DB QI A AA A A Annotator Database Query InterfaceQI User Data submission/access Ideally Reality
  • 7. Data resources in life science • Many • Diverse • Disperse NAR online Molecular Biology Database Collection 2014 ~1800molecular biology data resources
  • 8. Utility of databasesScientificimpact Too little information Many, diverse & disperse databases and interfaces Tim Hubbard
  • 9. Data integration DB I DB I DB I DB I Ideally Compromise Database InterfaceI User Combining data residing in different sources … providing users with a unified view of these data. DB I DB DB DB DB I Reality
  • 10. Many, diverse & disperse databases and interfaces 18.12.18 10 Utility of bioinformaticsScientificimpact Too little bioinformatics Integration of
  • 11. Data integration issues Many data sources • Maintain and update • New appearing • Many vanishing* Different query interfaces data integration? Variable results • Syntax • Semantics • Minimum information * Merali Z. et all. Databases in peril. Nature 2005. Where to find them? Redundant data?
  • 12. Standards • Community agreed specification for how data types should be represented and described. • Standards facilitates:  Interoperability  Integration  Exchange  Portability  Comparison  Representation  Sharing  Replication  Consistency  Verification  Compliance  Reusability  Access  Submission  Analysis  Edition  Visualization  Conversion  Validation  Annotation  Search
  • 14. Improving Links Between distributed European resources ELIXIR pilot: Interoperability of protein expressions resources The Human Protein Atlas portal is a publicly available database with millions of high-resolution images showing the spatial distribution of proteins in 46 different normal human tissues and 20 different cancer types, as well as 47 different human cell lines.
  • 17. Standards in data sharing http://www.mrc.ac.uk/Utilities/Documentrecord/index.htm?d=MRC002552
  • 18. Different formats for the same data 18 MI Data PSI-XML PSI-MITAB BioPax RDF Cytoscape DAS • Comprehensive • Simple • Generic • Domain specific • Structured
  • 21. Registry - Minimum information guidelines
  • 22. Registry - Controlled vocabularies • Ontology browser: http://www.ebi.ac.uk/ontology-lookup Ontology Lookup Service
  • 23. Communities organized per domain • Produce technical standards intended to address the needs of a community of users. develop, coordinate, promulgate, revise, amend, reissue, interpret 23
  • 24. ELIXIR role • Support communities developing standards • Encourage communication among communities • Links amongst standards • Promote the adoption of standards • Help to find the gaps among standards • Recommend standards best practices in data sharing 24
  • 25. European Life Sciences Infrastructure for Biological Information www.elixir-europe.org Data deluge & standards BMB workshops update
  • 26. Knowledge ExchangeWorkshop:WP3 Standards 24 - 25 June 2014.VUMC, Amsterdam,The Netherlands •Best practice for identifiers •Development of the BMB standards registry 26
  • 27. Identifiers Best Practice - purpose • Recommendations for identifiers best practice • Designing (format, re-use) • Managing (creation, versioning, provenance, deprecation etc.) • Using (resolving, mapping etc.) • Publish a paper • Introduction to identifier concepts • Case Studies illustrating identifier usage in real-world scenarios • Recommendations on best practice • Show not tell • Descriptive not normative • fornon-experts/newcomers • Gap analysis • list the biological entities and identifiers type used by BMB partners 27
  • 28. Identifiers Best Practice – topics 1/2 • Identifier formats • syntax of database IDs, URI patterns • Identifier management • creation, versioning, provenance, deprecation • Identifier resolution • how to use an ID to get useful information about the entity • services for this, e.g. Identifiers.org • what info. should be given ? 28
  • 29. Identifiers Best Practice – topics 2/2 • Identifier mapping / aggregation • how to map IDs on entries in one resource to those in another, to assign equivalence / make useful links • e.g. IDs for equivalent protein sequences in different organisms • e.g. probe->gene->pathway->function (GO) • Identifier cataloguing • compilations of types of identifiers (e.g. EDAM ontology) or specific identifiers (e.g. Cell Line Ontology) • what info. should be given? • Case Studies • use of IDs in a particular domain 29
  • 30. Standards format registry - purpose • Discovery, Agreement, Benchmarking, … • Facilitate syntactic operability across research infrastructure so samples and data can be integrated and analysed across ESFRI BMS domains. 30
  • 31. Standards format registry - topics • Catalogue of standard • Interoperability among registry • identifiers.org, biosahring, EDAM, ELIXIR service registry • Adaptat to user needs • Community of users vs. community of producers • Microstandards • Standards mapping • Access to exert knowledge • Assess fit for purpose • Rating/metrics 31
  • 32. BioMedBridges Knowledge ExchangeWorkshop Tuesday 24 - Wednesday 25 June 2014.VUMC, Amsterdam,The Netherlands Workshop organised byWP3 - Standards Description and Harmonisation, to bring together BMB partners, biomedical standards experts and representatives of external projects. Best practice for identifiers •Gap analysis of current identifiers Development of the BMB standards registry •Gap analysis for usage of the registry •Integration of the registry with other tools 32
  • 33. Identifiers Best Practice - purpose • Recommendations for identifiers best practice • Designing (format, re-use) • Managing (creation, versioning, provenance, deprecation etc.) • Using (resolving, mapping etc.) • Publish a paper • Introduction to identifier concepts • Case Studies illustrating identifier usage in real-world scenarios • Recommendations on best practice • Show not tell • Descriptive not normative • fornon-experts/newcomers • Gap analysis • list the biological entities and identifiers type used by BMB partners 33
  • 34. Identifiers Best Practice – topics 1/2 • Identifier formats • syntax of database IDs, URI patterns • Identifier management • creation, versioning, provenance, deprecation • Identifier resolution • how to use an ID to get useful information about the entity • services for this, e.g. Identifiers.org • what info. should be given ? 34
  • 35. Identifiers Best Practice – topics 2/2 • Identifier mapping / aggregation • how to map IDs on entries in one resource to those in another, to assign equivalence / make useful links • e.g. IDs for equivalent protein sequences in different organisms • e.g. probe->gene->pathway->function (GO) • Identifier cataloguing • compilations of types of identifiers (e.g. EDAM ontology) or specific identifiers (e.g. Cell Line Ontology) • what info. should be given? • Case Studies • use of IDs in a particular domain 35
  • 36. Standards format registry - purpose • Discovery, Agreement, Benchmarking, … • Facilitate syntactic operability across research infrastructure so samples and data can be integrated and analysed across ESFRI BMS domains. 36
  • 37. Standards format registry - topics • Catalogue of standard • Interoperability among registry • identifiers.org, biosahring, EDAM, ELIXIR service registry • Adaptat to user needs • Community of users vs. community of producers • Microstandards • Standards mapping • Access to exert knowledge • Assess fit for purpose • Rating/metrics 37
  • 38. BioMedBridges workshops Knowledge ExchangeWorkshop:WP3 Standards 24 -25 June 2014 VUMC, Amsterdam,The Netherlands 38 E-Infrastructure support for the life sciences: Preparing for the data deluge 15 May 2014 Genome Campus, Hinxton, UK
  • 39. BioMedBridges workshop E-Infrastructure support for the life sciences: Preparing for the data deluge 15 May, 2014 Genome Campus, Hinxton, UK
  • 40. Knowledge exchange workshop • Discussion of big data challenges in life sciences • Focus on few representative domains • Looking 5 years ahead • Jointly identify potential solutions to our problems Data ICT e-infrastructures LS life sciencesPhysical facilities Scientific information Transfer Computation Storage
  • 41. How does it affect data sharing in life sciences?
  • 42. Large-scale data sharing in the life sciences http://www.mrc.ac.uk/Utilities/Documentrecord/index.htm?d=MRC002552
  • 43. How does big data affect data sharing? http://www.mrc.ac.uk/Utilities/Documentrecord/index.htm?d=MRC002552 Compute Compute Compute Storage Compute Transfer Transfer Transfer Transfer Transfer Storage Storage Storage What How Where
  • 44.
  • 46. Cost of DNA sequencing 46
  • 47. Data generation vs. data transfer 47 ~100 GB ~4 TB ~4 TB 24 hours 1 Gb 100 Mb 10 Mb ~30 min ~9 hour ~9 hour ~5 hours ~4 days ~4 days ~2 days ~5 weeks ~5 weeks DNA sequencing Mass spectrometry Microscopy Network File Transfer
  • 48. Bottlenecks in Life Sciences? • Data production grows faster than storage • Cost of data production technologies declines faster than storage • It takes longer to transfer data than produce the data.
  • 49. Data growth how to reduce the IT budget shortfall? http://www.eweek.com/
  • 50. Data growth how to reduce the IT budget shortfall? http://www.eweek.com/ Optimization Using technology more effectively Selecting relevant data
  • 51. Potential solutions • Storage • Data compression • Select what we store • Evaluate data reproducibility & value of data • Network • Faster protocols • Partitioning • Network upgrade • Computation • Clouds • Data close to computation
  • 52. Data compression • Efficient representation • Capacity for controlled data reduction • Efficient transformations • Tool chain Precisi on Compression CRAM Fritz, M.H. Leinonen, R., et al. (2011) Efficient storage of high throughput DNA sequencing data using reference-based compression. Genome Res. 21 (5), 734-40 Cochrane G., Cook C.E. and Birney E. (2012) The future of DNA sequence archiving. GigaScience 2012, 1:2 http://www.ebi.ac.uk/ena/about/cram_toolkit
  • 53. Data transfer optimization • e.g. Getting more from available bandwidth Guy Cochrane, EMBL-EBI
  • 54. Data partitioning • Organisation of data around biological concepts • Indexing system around these concepts • Support for requests for partitions along this index Reference-oriented indexing Guy Cochrane, EMBL-EBI
  • 55. What data is relevant?
  • 56. 56 Life sciences diversity Genomes Nucleotides Transcripts Proteins Complexes Pathways Small molecules Structures Domains Cells Biobanks Tissues and organs Human populations Therapies Disease prevention Early Diagnosis Human individuals
  • 57. Life sciences diversity • Different communities • Some similar requirements • Not always the same solutions ProteomicsMetabolomics Clinical data GenomicsImaging
  • 58. Some conclusions • Opportunity for e-infrastructures to better understand BMS RI problems. • Identification of bottlenecks • Discussion of some potential solutions • Data growth will change how we do things today • Different communities -> different models -> some common solutions • Solutions have to come from use cases • BMS RI need to be better defining requirements • We need to use technology more efficiently • BMS community has to evaluate the practicality of storing everything • Privacy issues makes big data more challenging • Difficult to separate big data from computation • Shortage of expertise of how to deal with scientific data and IT services
  • 59. European Life Sciences Infrastructure for Biological Information www.elixir-europe.org Thank you
  • 61. Data submission 61 raw data processed data metadata Centralized database
  • 62. Data sharing Data on my disk and available to anyone who requests it Submission to data repositories From To
  • 63. Data submissions 63 Data repository Journal submission Data repository Journal submission reads Journal request Curator Data repository Data Management Plan submission Data management +
  • 64. Data sharing Will big data affect data deposition? Data on my disk and available to anyone who requests it Submission to data repositoriesFrom To
  • 65. Data submissions How much data? How much available data?
  • 66. European Life Sciences Infrastructure for Biological Information www.elixir-europe.org Thank you

Editor's Notes

  1. Previous example leads into BioMedBridges project that build bridges between the infrastructures and starting to develop data and service bridges to support research projects that of course will access and benefit from services involving several of these.
  2. As a biologist I would prefer to see all the information in one unique database. Centralized databases have this mission. The aim to collect all the information for one specific domain. However … Medium-size labs and organizations are capable to produce large amounts of data. The it becomes harder to submit data to centralized repositories. Moreover data producers like to control and structure their own databases, developing their own GUI and access protocols. For us, the users, it becomes harder to access the information. For one specific domain we might find different databases, using different GUIs. We might end up downloading data in different formats complicating the integration of results. After integration we might find a problem of high redundancy in our results.
  3. Data resource: Sustainability, availability and integration
  4. 'compute power’ doubles every two years. Production of data doubles faster.
  5. Sequencing prices below Moore’s law Moore’s law predict exponential decline of computing cost Doubling of 'compute power' every two years Store data more expensive than produce it
  6. Technology get cheaper and faster ~15.000 hospital ~4.000 universities ~2.000 life sciences research institutes How much data we will produce? How we will store it?
  7. decline of computing cost
  8. necessary to understand, develop or reproduce published research
  9. Not all the data make it to the public repositories
  10. necessary to understand, develop or reproduce published research