SlideShare a Scribd company logo
1 of 27
How to find and provide FAANG
Data
The FAANG Data Coordination
Centre
Laura Clarke
Vertebrate Data Coordination
www.ebi.ac.uk
@laurastephen
Value of Metadata
Data Access
Metadata Standards
Validation tools
FAANG Data availability
Support
Tara Oceans
•2 ½ year expedition
•210 sampling stations
•Standardized measurements
•Genetic
•Morphological
•Physico-Chemical
Good metadata enables great science
Good metadata enables great science
HipSci
•750 iPSC lines
•Healthy and rare disease donors
•Extensive genomic and epigenomic characterization
•All lines and data available to community
Good metadata enables great science
H Kilpinen et al. Nature 546, 370–375 (2017) doi:10.1038/nature22403
Good metadata enables great science
The FAANG Data Coordination Centre
•Supporting Submission
•Ensuring high quality data description
•Making the data accessible
•Providing consistent analysis products
Findable
• Global persistent identifier
• Rich metadata
• Store metadata in
registries
Accessible
• Resolvable identifiers
• Metadata persists
• Machine and human
access
Interoperable
• Open data format
• Modelled with FAIR
compliant vocabularies
• Reference external data
Reusable
• Rich metadata
• Clear license
• Provenance
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific
data management and stewardship Authors. Nature Scientific Data
3, 1–15 (2016). DOI: 10.1038/sdata.2016.18
Alasdair J G Gray
Heriot-Watt University
Ensuring the data is FAIR
• Needs
• Well structured
• Consistent naming
• Specific descriptions
• Enables
• Aggregation
• Integration
• Tracking
Good data is well described data
• Representation of important things in a specific domain
• Describes types of entities (e.g. cells) and relations between them
• An active, formal computational artifact
• A mathematical model based on a subset of first order logic
• Tools can automatically process ontologies for analysis - e.g. gene expression enrichment
analysis
• A communication tool
• Provides a dictionary for collaborators, a shared understanding
• Allows data sharing
Use Ontologies
Myeloid Leucocyte
Monocyte
CD14+ Monocyte
• OLS - The Ontology Lookup Service
• http://www.ebi.ac.uk/ols/index
• Indexes 150 biomedical ontologies
• (4.5 million terms, 11 million relations)
• Zooma
• http://www.ebi.ac.uk/spot/zooma/
• Using past knowledge to inform new annotation
• Curated mappings from the Expression Atlas, Open Targets and others
• Webulous
• http://www.ebi.ac.uk/spot/webulous/
• GoogleSheets template system
• Create new ontology terms
• OXO (in beta)
• http://www.ebi.ac.uk/spot/oxo/
• Cross references between ontologies
• All services have API and UI access
Webulous
Use Ontologies
Supporting deposition of well described data
FAANG Validation Service
Validates completed metadata Excel templates and
prepares metadata for archive submission
http://www.ebi.ac.uk/vg/faang
Supporting deposition of well described data
•Checks ontologies (scope, accuracy, terms).
•Relationships (familial, breeds).
•Minimum standards and validity.
Supporting deposition of well described data
Supporting deposition of well described data
Supporting deposition of well described data
•On conversion, validates
again and checks project
information.
•If passes, returns correctly
formatted SampleTab for
BioSamples and XML for
ENA.
Supporting deposition of well described data
• The Validation service code and website
• http://www.ebi.ac.uk/vg/faang
•https://github.com/faang/faang-metadata
• https://github.com/FAANG/validate-metadata
How much data?
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Donor
Specimen
Gallus Gallus Ovis aries Sus scrofa Bos taurus Bubalus bubalis Capra hircus
1240
8428 Specimens
285 Donor Animals
132 62 56 14 13 8
678 2479 1423 1667 941
How much data?
132 62 56 14 13 8
678 2479 1423 1667 941
8 European Nucleotide Archive studies submitted
• 4891 sequencing runs
Largest submission
• RNA sequencing of tissues and cell types from Scottish
Blackface x Texel sheep for transcriptome annotation
and expression analysis, The Roslin Institute
• 3994 sequencing runs
Finding the FAANG Data
http://data.faang.org/home
Finding the FAANG Data
http://data.faang.org/organism
Finding the FAANG Data
http://data.faang.org/organism/SAMEA103886117
Finding the FAANG Data
http://data.faang.org/specimen/SAMEA103886170
Finding the FAANG Data
•More Data
• Additional FAANG data
• Other livestock data using legacy standards
•Standard Analysis products
•Trackhub links
•Better search
•Sortable tables
Who is helping you?
Peter Harrison Jun Fan
faang-dcc@ebi.ac.uk
Overview
0% 20% 40% 60% 80% 100%
Donor
Specimen
Questions?
Find out how to submit data
http://bit.ly/FAANGArchiveGuide
Ask for help
faang-dcc@ebi.ac.uk
@faangomics on twitter
Let us know about your project
http://bit.ly/FAANGProjectRegistry

More Related Content

What's hot

Wrangling RedCap_An Introduction and Inspiration
Wrangling RedCap_An Introduction and InspirationWrangling RedCap_An Introduction and Inspiration
Wrangling RedCap_An Introduction and Inspiration
Jacqueline Stern
 
From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...
Catherine Canevet
 
FAIR Data and Model Management for Systems Biology (and SOPs too!)
FAIR Data and Model Management for Systems Biology(and SOPs too!)FAIR Data and Model Management for Systems Biology(and SOPs too!)
FAIR Data and Model Management for Systems Biology (and SOPs too!)
Carole Goble
 

What's hot (20)

THOR Workshop - Data Publishing Elsevier
THOR Workshop - Data Publishing ElsevierTHOR Workshop - Data Publishing Elsevier
THOR Workshop - Data Publishing Elsevier
 
OpenAIRE-COAR conference 2014: Allowing research data to shine: providing tan...
OpenAIRE-COAR conference 2014: Allowing research data to shine: providing tan...OpenAIRE-COAR conference 2014: Allowing research data to shine: providing tan...
OpenAIRE-COAR conference 2014: Allowing research data to shine: providing tan...
 
A snake, a planet, and a bear ditching spreadsheets for quick, reproducible r...
A snake, a planet, and a bear ditching spreadsheets for quick, reproducible r...A snake, a planet, and a bear ditching spreadsheets for quick, reproducible r...
A snake, a planet, and a bear ditching spreadsheets for quick, reproducible r...
 
Big Data Initiatives for Agroecosystems
Big Data Initiatives for AgroecosystemsBig Data Initiatives for Agroecosystems
Big Data Initiatives for Agroecosystems
 
Wrangling RedCap_An Introduction and Inspiration
Wrangling RedCap_An Introduction and InspirationWrangling RedCap_An Introduction and Inspiration
Wrangling RedCap_An Introduction and Inspiration
 
NIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery IndexNIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery Index
 
The Kaleidoscope of Impact: same data, different perspectives, constantly cha...
The Kaleidoscope of Impact: same data, different perspectives, constantly cha...The Kaleidoscope of Impact: same data, different perspectives, constantly cha...
The Kaleidoscope of Impact: same data, different perspectives, constantly cha...
 
Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...
Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...
Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...
 
Building on the Atlas (of Living Australia)
Building on the Atlas (of Living Australia)Building on the Atlas (of Living Australia)
Building on the Atlas (of Living Australia)
 
From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...
 
THOR Ambassador Webinar
THOR Ambassador WebinarTHOR Ambassador Webinar
THOR Ambassador Webinar
 
TAIR ICAR 2010 Presentation
TAIR ICAR 2010 PresentationTAIR ICAR 2010 Presentation
TAIR ICAR 2010 Presentation
 
Data Repositories Impact
Data Repositories ImpactData Repositories Impact
Data Repositories Impact
 
Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure
 
FAIR Data and Model Management for Systems Biology (and SOPs too!)
FAIR Data and Model Management for Systems Biology(and SOPs too!)FAIR Data and Model Management for Systems Biology(and SOPs too!)
FAIR Data and Model Management for Systems Biology (and SOPs too!)
 
New product developments - Jennifer Lin - London LIVE 2017
New product developments - Jennifer Lin - London LIVE 2017New product developments - Jennifer Lin - London LIVE 2017
New product developments - Jennifer Lin - London LIVE 2017
 
Workflows for Publishing Data; Scientific Data's experience as an early adopter
Workflows for Publishing Data; Scientific Data's experience as an early adopterWorkflows for Publishing Data; Scientific Data's experience as an early adopter
Workflows for Publishing Data; Scientific Data's experience as an early adopter
 
Medical Intelligence EDW 20 juni: Radboudumc
Medical Intelligence EDW 20 juni: RadboudumcMedical Intelligence EDW 20 juni: Radboudumc
Medical Intelligence EDW 20 juni: Radboudumc
 
Developing Apps: Exposing Your Data Through Araport
Developing Apps: Exposing Your Data Through AraportDeveloping Apps: Exposing Your Data Through Araport
Developing Apps: Exposing Your Data Through Araport
 
Data sharing as part of the research ecosystem
Data sharing as part of the research ecosystemData sharing as part of the research ecosystem
Data sharing as part of the research ecosystem
 

Similar to L clarke faang_dcc_isag_2017_compress

Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing data
World Agroforestry (ICRAF)
 
The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Monica Munoz-Torres
 

Similar to L clarke faang_dcc_isag_2017_compress (20)

Variation and Assembly Resources at EMBL-EBI
Variation and Assembly Resources at EMBL-EBIVariation and Assembly Resources at EMBL-EBI
Variation and Assembly Resources at EMBL-EBI
 
(BAC208) Bursting to the Cloud: Deploying a Hybrid Cloud Storage Solution wit...
(BAC208) Bursting to the Cloud: Deploying a Hybrid Cloud Storage Solution wit...(BAC208) Bursting to the Cloud: Deploying a Hybrid Cloud Storage Solution wit...
(BAC208) Bursting to the Cloud: Deploying a Hybrid Cloud Storage Solution wit...
 
COPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob DaveyCOPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob Davey
 
An Oz Mammals Bioinformatics and Data Resource
An Oz Mammals Bioinformatics and Data ResourceAn Oz Mammals Bioinformatics and Data Resource
An Oz Mammals Bioinformatics and Data Resource
 
Globus Integrations (GlobusWorld Tour - UMich)
Globus Integrations (GlobusWorld Tour - UMich)Globus Integrations (GlobusWorld Tour - UMich)
Globus Integrations (GlobusWorld Tour - UMich)
 
Kasyanov "Web of Science API Workshop"
Kasyanov "Web of Science API Workshop"Kasyanov "Web of Science API Workshop"
Kasyanov "Web of Science API Workshop"
 
COPO kick-off meeting
COPO kick-off meetingCOPO kick-off meeting
COPO kick-off meeting
 
Dataverse Netowrk Project
Dataverse Netowrk ProjectDataverse Netowrk Project
Dataverse Netowrk Project
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing data
 
Globus Integrations (GlobusWorld Tour - UCSD)
Globus Integrations (GlobusWorld Tour - UCSD)Globus Integrations (GlobusWorld Tour - UCSD)
Globus Integrations (GlobusWorld Tour - UCSD)
 
The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
 
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
 
Globus Integrations (JupyterHub, Django, ...)
Globus Integrations (JupyterHub, Django, ...)Globus Integrations (JupyterHub, Django, ...)
Globus Integrations (JupyterHub, Django, ...)
 
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
 
How to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusableHow to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusable
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
 
Beyond openurl
Beyond openurlBeyond openurl
Beyond openurl
 
David Van Enckevort - FAIR sample and data access
David Van Enckevort - FAIR sample and data access David Van Enckevort - FAIR sample and data access
David Van Enckevort - FAIR sample and data access
 

Recently uploaded

Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Sérgio Sacani
 

Recently uploaded (20)

SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 

L clarke faang_dcc_isag_2017_compress

  • 1. How to find and provide FAANG Data The FAANG Data Coordination Centre Laura Clarke Vertebrate Data Coordination www.ebi.ac.uk @laurastephen
  • 2. Value of Metadata Data Access Metadata Standards Validation tools FAANG Data availability Support
  • 3. Tara Oceans •2 ½ year expedition •210 sampling stations •Standardized measurements •Genetic •Morphological •Physico-Chemical Good metadata enables great science
  • 4. Good metadata enables great science
  • 5. HipSci •750 iPSC lines •Healthy and rare disease donors •Extensive genomic and epigenomic characterization •All lines and data available to community Good metadata enables great science
  • 6. H Kilpinen et al. Nature 546, 370–375 (2017) doi:10.1038/nature22403 Good metadata enables great science
  • 7. The FAANG Data Coordination Centre •Supporting Submission •Ensuring high quality data description •Making the data accessible •Providing consistent analysis products
  • 8. Findable • Global persistent identifier • Rich metadata • Store metadata in registries Accessible • Resolvable identifiers • Metadata persists • Machine and human access Interoperable • Open data format • Modelled with FAIR compliant vocabularies • Reference external data Reusable • Rich metadata • Clear license • Provenance Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship Authors. Nature Scientific Data 3, 1–15 (2016). DOI: 10.1038/sdata.2016.18 Alasdair J G Gray Heriot-Watt University Ensuring the data is FAIR
  • 9. • Needs • Well structured • Consistent naming • Specific descriptions • Enables • Aggregation • Integration • Tracking Good data is well described data
  • 10. • Representation of important things in a specific domain • Describes types of entities (e.g. cells) and relations between them • An active, formal computational artifact • A mathematical model based on a subset of first order logic • Tools can automatically process ontologies for analysis - e.g. gene expression enrichment analysis • A communication tool • Provides a dictionary for collaborators, a shared understanding • Allows data sharing Use Ontologies Myeloid Leucocyte Monocyte CD14+ Monocyte
  • 11. • OLS - The Ontology Lookup Service • http://www.ebi.ac.uk/ols/index • Indexes 150 biomedical ontologies • (4.5 million terms, 11 million relations) • Zooma • http://www.ebi.ac.uk/spot/zooma/ • Using past knowledge to inform new annotation • Curated mappings from the Expression Atlas, Open Targets and others • Webulous • http://www.ebi.ac.uk/spot/webulous/ • GoogleSheets template system • Create new ontology terms • OXO (in beta) • http://www.ebi.ac.uk/spot/oxo/ • Cross references between ontologies • All services have API and UI access Webulous Use Ontologies
  • 12. Supporting deposition of well described data FAANG Validation Service Validates completed metadata Excel templates and prepares metadata for archive submission http://www.ebi.ac.uk/vg/faang
  • 13. Supporting deposition of well described data •Checks ontologies (scope, accuracy, terms). •Relationships (familial, breeds). •Minimum standards and validity.
  • 14. Supporting deposition of well described data
  • 15. Supporting deposition of well described data
  • 16. Supporting deposition of well described data •On conversion, validates again and checks project information. •If passes, returns correctly formatted SampleTab for BioSamples and XML for ENA.
  • 17. Supporting deposition of well described data • The Validation service code and website • http://www.ebi.ac.uk/vg/faang •https://github.com/faang/faang-metadata • https://github.com/FAANG/validate-metadata
  • 18. How much data? 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Donor Specimen Gallus Gallus Ovis aries Sus scrofa Bos taurus Bubalus bubalis Capra hircus 1240 8428 Specimens 285 Donor Animals 132 62 56 14 13 8 678 2479 1423 1667 941
  • 19. How much data? 132 62 56 14 13 8 678 2479 1423 1667 941 8 European Nucleotide Archive studies submitted • 4891 sequencing runs Largest submission • RNA sequencing of tissues and cell types from Scottish Blackface x Texel sheep for transcriptome annotation and expression analysis, The Roslin Institute • 3994 sequencing runs
  • 20. Finding the FAANG Data http://data.faang.org/home
  • 21. Finding the FAANG Data http://data.faang.org/organism
  • 22. Finding the FAANG Data http://data.faang.org/organism/SAMEA103886117
  • 23. Finding the FAANG Data http://data.faang.org/specimen/SAMEA103886170
  • 24. Finding the FAANG Data •More Data • Additional FAANG data • Other livestock data using legacy standards •Standard Analysis products •Trackhub links •Better search •Sortable tables
  • 25. Who is helping you? Peter Harrison Jun Fan faang-dcc@ebi.ac.uk
  • 26. Overview 0% 20% 40% 60% 80% 100% Donor Specimen
  • 27. Questions? Find out how to submit data http://bit.ly/FAANGArchiveGuide Ask for help faang-dcc@ebi.ac.uk @faangomics on twitter Let us know about your project http://bit.ly/FAANGProjectRegistry

Editor's Notes

  1. I have a good example from Tara Oceans of where metadata relating to samples allows image and sequence samples to be aligned and a close ecological relationship to be discovered between an alga and a diatom - essentially, hight-throughput sequence data showed more-than-expected co-location of two species, this led to paring down to a number of bodies of water in given locations (metadata), high-throughput image samples from the same bodies of water could then be selectively inspected to reveal how close the ecological relationship was.
  2. I have a good example from Tara Oceans of where metadata relating to samples allows image and sequence samples to be aligned and a close ecological relationship to be discovered between an alga and a diatom - essentially, hight-throughput sequence data showed more-than-expected co-location of two species, this led to paring down to a number of bodies of water in given locations (metadata), high-throughput image samples from the same bodies of water could then be selectively inspected to reveal how close the ecological relationship was.
  3. Why Improve your analysis Easier to find batch effects and confounding factors Make your data usable Reduce ambiguity Facilitate reproduction of results Improve integration across labs, projects and data modalities Make your data discoverable Other researchers Integration services (Ensembl, Gene Expression Atlas)
  4. Why Improve your analysis Easier to find batch effects and confounding factors Make your data usable Reduce ambiguity Facilitate reproduction of results Improve integration across labs, projects and data modalities Make your data discoverable Other researchers Integration services (Ensembl, Gene Expression Atlas)