SlideShare a Scribd company logo
European Life Sciences Infrastructure for Biological Information
www.elixir-europe.org
a Pilot with Proteomics Data
Rafael C Jimenez
ELIXIR CTO
ELIXIR HoN &TCG meeting, 7 May 2014
Update
Data submissions and archiving raw data in life sciences
EUDAT services
2
• File storage
Data examples
3
Raw data Process data Metadata
DNA
Human
Liver
Mitochondria
W. Smith
…
Peptide
Mouse
Heart
Nucleus
J. Heinz
…
LPISASHSSK…
TTGTTATCCG…
… … …
Data submissions
4
Submissions
raw data
processed data
metadata
Data
repository
Search
Integration
Proteomics data in PRIDE
5
~85% raw data
Delegation of raw data
6
processed data
metadata
Data
repository
PID
Submissions
Search
Integration
ProteomeCentral
Metadata /
Manuscript
Raw Data*
Results
Journals
UniProt/
NeXtProt
Peptide Atlas
Other DBs
Receiving repositories
PASSEL
(SRM data)
PRIDE
(MS/MS data)
GPMDB
Researcher’s results Reprocessed results Raw data* Metadata
ProteomeXchange
Vizcaíno et al., Nature Biotechnology, 2014
• Framework to enable standard data submission and
dissemination pipelines between the main existing
proteomics resources.
ProteomeCentral
Metadata /
Manuscript
Raw Data*
Results
Journals
UniProt/
NeXtProt
Peptide Atlas
Other DBs
Receiving repositories
GPMDB
Researcher’s results Reprocessed results Raw data* Metadata
ProteomeXchange
Vizcaíno et al., Nature Biotechnology, 2014
Raw Data*
Pilot …
PASSEL
(SRM data)
PRIDE
(MS/MS data)
ProteomeCentral
Metadata /
Manuscript
Raw Data*
Results
Journals
UniProt/
NeXtProt
Peptide Atlas
Other DBs
Receiving repositories
GPMDB
Researcher’s results Reprocessed results Raw data* Metadata
ProteomeXchange
Vizcaíno et al., Nature Biotechnology, 2014
Raw Data*
PRIDE
(MS/MS data)
PASSEL
(SRM data)
Option …
European LifeSciences Infrastructure for Biological Information
www.elixir-europe.org
Thank you for your attention
11
Preparing for the data deluge in life sciences
BioMedBridges workshop
15-16 May 2014
Wellcome Trust Genome Campus
Hinxton, UK
Request attendance to Stephanie Suhr <ssuhr@ebi.ac.uk>
Important points to explore
• EUDAT programmatic access for data submission
• Synchronization for data updates
• Keep unpublished datasets private until publication
• Keep up with the growth (40TB, doubling every 4 months)
• Use HPC facilities thought EUDAT to process raw data
• Example for other repositories in life sciences
12
Research infrastructures
13
Data
ICT
e-infrastructures
LS
life sciences
Facilitate research
Physical facilities
Scientific information
Communication
Computation
Storage
Noble WS, MacCoss MJ (2012) Computational and Statistical Analysis of Protein Mass Spectrometry Data. PLoS Comput Biol 8(1):
e1002296. doi:10.1371/journal.pcbi.1002296
http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002296
Overview of shotgun proteomics data production
MKKKNIYSIRKLGVG
IASVTLGTLLISG
GVTPAANAAQHD
FYQVLNMPNLNADQ
RNGFIQSLK
DDPSQSANVKLN
4
Peptide sequences
15 Martens et al., Proteomics, 2005
Vizcaíno et al., NAR, 2013
PRIDE (PRoteomics IDEntifications) database
Mass spectrometry
Origin:
152 USA
108 Germany
67 United Kingdom
53 Switzerland
48 Netherlands
42 China
42 Canada
41 France
36 Spain
33 Belgium
25 Australia
23 Sweden
17 Japan
16 Denmark
13 Norway
12 Finland
12 India
12 Taiwan
10 Italy
9 Republic of Korea
8 Austria
8 Ireland
8 Brazil
7 Singapore
5 Israel
5 Russia …
821 submitted datasets by April 1st 2014
Type:
273 PRIDE complete
501 PRIDE partial
47 PeptideAtlas/PASSEL complete
Access:
38.3% PRIDE public
5.3% PASSEL public
56% PRIDE private
0.4% PASSEL private
Data volume:
Total: >40 TB
Number of all files: >120,000
PXD000320-324: ~ 5 TB
PXD000065: ~ 1.4TB
Top Species studied by at least 8
datasets:
381 Homo sapiens
100 Mus musculus
31 Arabidopsis thaliana
26 Saccharomyces cerevisiae
16 Escherichia coli
14 Rattus norvegicus
12 Mycobacterium tuberculosis
11 Drosophila melanogaster
~ 215 species in total
Submissions/year:
2012: 102
2013: 527
2014: 192
Acknowledgements
PeptideAtlas Team
Eric Deutsch
Terry Farrah
Zhi Sun
Andrew R. Jones
Lennart Martens
Juan Pablo Albar
Martin Eisenacher
Gil Omenn
And many other PX partners and
stakeholders
PRIDE team
Juan Antonio Vizcaino
Henning Hermjakob
Attila Csordas
Rui Wang
Florian Reisinger
Tobias Ternent
Noemi del Toro
Jose A. Dianes
Yasset Perez-Riverol
Qing-Wei Xu
Ilias Lavidas

More Related Content

What's hot

Databases in Bioinformatics
Databases in BioinformaticsDatabases in Bioinformatics
Databases in Bioinformatics
Meghaj Mallick
 
Introduction to Bioinformatics.
 Introduction to Bioinformatics. Introduction to Bioinformatics.
Introduction to Bioinformatics.
Elena Sügis
 
Biological Databases
Biological DatabasesBiological Databases
Biological Databases
Shweta Kagliwal
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASE
PrashantSharma807
 
EMBL
EMBLEMBL
Data retreival system
Data retreival systemData retreival system
Data retreival system
Shikha Thakur
 
Features of biological databases
Features of biological databasesFeatures of biological databases
Features of biological databases
Charu Sharma
 
Biological database by kk sahu
Biological database by kk sahuBiological database by kk sahu
Biological database by kk sahu
KAUSHAL SAHU
 
Database technologies in bioinformatics
Database technologies in bioinformaticsDatabase technologies in bioinformatics
Database technologies in bioinformatics
Gleb Sklyr
 
Bioinformatics principles and applications
Bioinformatics principles and applicationsBioinformatics principles and applications
Intro bioinfo
Intro bioinfoIntro bioinfo
Intro bioinfo
Vinitha Nair
 
Uni prot presentation
Uni prot presentationUni prot presentation
Uni prot presentation
Rida Khalid
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
KAUSHAL SAHU
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES
nadeem akhter
 
Databases
DatabasesDatabases
Databases
afzamalik
 
Data base in detail
Data base in detailData base in detail
Data base in detail
Vartika Mishra
 
Tools and database of NCBI
Tools and database of NCBITools and database of NCBI
Tools and database of NCBI
Santosh Kumar Sahoo
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
Saramita De Chakravarti
 
protein databases
 protein databases protein databases
protein databases
wasisyed
 
Data retrieval tools
Data retrieval toolsData retrieval tools
Data retrieval tools
Vidya Kalaivani Rajkumar
 

What's hot (20)

Databases in Bioinformatics
Databases in BioinformaticsDatabases in Bioinformatics
Databases in Bioinformatics
 
Introduction to Bioinformatics.
 Introduction to Bioinformatics. Introduction to Bioinformatics.
Introduction to Bioinformatics.
 
Biological Databases
Biological DatabasesBiological Databases
Biological Databases
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASE
 
EMBL
EMBLEMBL
EMBL
 
Data retreival system
Data retreival systemData retreival system
Data retreival system
 
Features of biological databases
Features of biological databasesFeatures of biological databases
Features of biological databases
 
Biological database by kk sahu
Biological database by kk sahuBiological database by kk sahu
Biological database by kk sahu
 
Database technologies in bioinformatics
Database technologies in bioinformaticsDatabase technologies in bioinformatics
Database technologies in bioinformatics
 
Bioinformatics principles and applications
Bioinformatics principles and applicationsBioinformatics principles and applications
Bioinformatics principles and applications
 
Intro bioinfo
Intro bioinfoIntro bioinfo
Intro bioinfo
 
Uni prot presentation
Uni prot presentationUni prot presentation
Uni prot presentation
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES
 
Databases
DatabasesDatabases
Databases
 
Data base in detail
Data base in detailData base in detail
Data base in detail
 
Tools and database of NCBI
Tools and database of NCBITools and database of NCBI
Tools and database of NCBI
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
protein databases
 protein databases protein databases
protein databases
 
Data retrieval tools
Data retrieval toolsData retrieval tools
Data retrieval tools
 

Similar to Data submissions and archiving raw data in life sciences. A pilot with Proteomics Data

ProteomeXchange: data deposition and data retrieval made easy
ProteomeXchange: data deposition and data retrieval made easyProteomeXchange: data deposition and data retrieval made easy
ProteomeXchange: data deposition and data retrieval made easy
Juan Antonio Vizcaino
 
OpenTox Europe 2013
OpenTox Europe 2013OpenTox Europe 2013
OpenTox Europe 2013
Alejandra Gonzalez-Beltran
 
Data Integration vs Transparency: Tackling the tension
Data Integration vs Transparency: Tackling the tensionData Integration vs Transparency: Tackling the tension
Data Integration vs Transparency: Tackling the tension
Paul Groth
 
Is it feasible to identify novel biomarkers by mining public proteomics data?
Is it feasible to identify novel biomarkers by mining public proteomics data?Is it feasible to identify novel biomarkers by mining public proteomics data?
Is it feasible to identify novel biomarkers by mining public proteomics data?
Juan Antonio Vizcaino
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
David Peyruc
 
PRIDE-ProteomeXchange
PRIDE-ProteomeXchangePRIDE-ProteomeXchange
PRIDE-ProteomeXchange
Juan Antonio Vizcaino
 
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...
Vaticle
 
ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...
ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...
ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...
Juan Antonio Vizcaino
 
Major resources of bioinformatics 2
Major resources of bioinformatics 2Major resources of bioinformatics 2
Major resources of bioinformatics 2
Mohd Affan
 
is there life between standards? Data interoperability for AI.
is there life between standards? Data interoperability for AI.is there life between standards? Data interoperability for AI.
is there life between standards? Data interoperability for AI.
Chris Evelo
 
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ... Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Syed Ahmad Chan Bukhari, PhD
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Creating a Comprehensive ...
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Creating a Comprehensive ...tranSMART Community Meeting 5-7 Nov 13 - Session 2: Creating a Comprehensive ...
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Creating a Comprehensive ...
David Peyruc
 
Semantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical InformaticsSemantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical Informatics
Amit Sheth
 
proteomics.ppt
proteomics.pptproteomics.ppt
proteomics.ppt
MANJUSINGH948460
 
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016Radiomics Data Management, Computation, and Analysis for QIN F2F 2016
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016
Ashish Sharma
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
Ian Foster
 
Practical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS projectPractical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS project
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
dkNET Webinar: Unlocking the Power of FAIR Data Sharing with ImmPort 04/12/2024
dkNET Webinar: Unlocking the Power of FAIR Data Sharing with ImmPort 04/12/2024dkNET Webinar: Unlocking the Power of FAIR Data Sharing with ImmPort 04/12/2024
dkNET Webinar: Unlocking the Power of FAIR Data Sharing with ImmPort 04/12/2024
dkNET
 
Reuse of public proteomics data
Reuse of public proteomics dataReuse of public proteomics data
Reuse of public proteomics data
Juan Antonio Vizcaino
 
NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...
NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...
NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...
European School of Oncology
 

Similar to Data submissions and archiving raw data in life sciences. A pilot with Proteomics Data (20)

ProteomeXchange: data deposition and data retrieval made easy
ProteomeXchange: data deposition and data retrieval made easyProteomeXchange: data deposition and data retrieval made easy
ProteomeXchange: data deposition and data retrieval made easy
 
OpenTox Europe 2013
OpenTox Europe 2013OpenTox Europe 2013
OpenTox Europe 2013
 
Data Integration vs Transparency: Tackling the tension
Data Integration vs Transparency: Tackling the tensionData Integration vs Transparency: Tackling the tension
Data Integration vs Transparency: Tackling the tension
 
Is it feasible to identify novel biomarkers by mining public proteomics data?
Is it feasible to identify novel biomarkers by mining public proteomics data?Is it feasible to identify novel biomarkers by mining public proteomics data?
Is it feasible to identify novel biomarkers by mining public proteomics data?
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
 
PRIDE-ProteomeXchange
PRIDE-ProteomeXchangePRIDE-ProteomeXchange
PRIDE-ProteomeXchange
 
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...
 
ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...
ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...
ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...
 
Major resources of bioinformatics 2
Major resources of bioinformatics 2Major resources of bioinformatics 2
Major resources of bioinformatics 2
 
is there life between standards? Data interoperability for AI.
is there life between standards? Data interoperability for AI.is there life between standards? Data interoperability for AI.
is there life between standards? Data interoperability for AI.
 
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ... Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Creating a Comprehensive ...
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Creating a Comprehensive ...tranSMART Community Meeting 5-7 Nov 13 - Session 2: Creating a Comprehensive ...
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Creating a Comprehensive ...
 
Semantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical InformaticsSemantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical Informatics
 
proteomics.ppt
proteomics.pptproteomics.ppt
proteomics.ppt
 
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016Radiomics Data Management, Computation, and Analysis for QIN F2F 2016
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
Practical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS projectPractical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS project
 
dkNET Webinar: Unlocking the Power of FAIR Data Sharing with ImmPort 04/12/2024
dkNET Webinar: Unlocking the Power of FAIR Data Sharing with ImmPort 04/12/2024dkNET Webinar: Unlocking the Power of FAIR Data Sharing with ImmPort 04/12/2024
dkNET Webinar: Unlocking the Power of FAIR Data Sharing with ImmPort 04/12/2024
 
Reuse of public proteomics data
Reuse of public proteomics dataReuse of public proteomics data
Reuse of public proteomics data
 
NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...
NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...
NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...
 

More from Rafael C. Jimenez

BMB Resource Integration Workshop
BMB Resource Integration WorkshopBMB Resource Integration Workshop
BMB Resource Integration Workshop
Rafael C. Jimenez
 
ELIXIR
ELIXIRELIXIR
ELIXIR
ELIXIRELIXIR
Summary of Technical Coordinators discussions
Summary of Technical Coordinators discussionsSummary of Technical Coordinators discussions
Summary of Technical Coordinators discussions
Rafael C. Jimenez
 
ELIXIR
ELIXIRELIXIR
The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...
Rafael C. Jimenez
 
Standardisation in BMS European infrastructures
Standardisation in BMS European infrastructuresStandardisation in BMS European infrastructures
Standardisation in BMS European infrastructures
Rafael C. Jimenez
 
ELIXIR
ELIXIRELIXIR
ELIXIR
ELIXIRELIXIR
Standards
StandardsStandards
ELIXIR TCG update
ELIXIR TCG updateELIXIR TCG update
ELIXIR TCG update
Rafael C. Jimenez
 
An introduction to programmatic access
An introduction to programmatic accessAn introduction to programmatic access
An introduction to programmatic access
Rafael C. Jimenez
 
Life science requirements from e-infrastructure: initial results from a joint...
Life science requirements from e-infrastructure:initial results from a joint...Life science requirements from e-infrastructure:initial results from a joint...
Life science requirements from e-infrastructure: initial results from a joint...
Rafael C. Jimenez
 
Technical activities in ELIXIR Europe
Technical activities in ELIXIR EuropeTechnical activities in ELIXIR Europe
Technical activities in ELIXIR Europe
Rafael C. Jimenez
 
Challenges of big data. Summary day 1.
Challenges of big data. Summary day 1.Challenges of big data. Summary day 1.
Challenges of big data. Summary day 1.
Rafael C. Jimenez
 
Challenges of big data. Aims of the workshop.
Challenges of big data. Aims of the workshop.Challenges of big data. Aims of the workshop.
Challenges of big data. Aims of the workshop.
Rafael C. Jimenez
 
ELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciencesELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciences
Rafael C. Jimenez
 
SASI, A lightweight standard for exchanging course information
SASI, A lightweight standard for exchanging course informationSASI, A lightweight standard for exchanging course information
SASI, A lightweight standard for exchanging course information
Rafael C. Jimenez
 
ELIXIR
ELIXIRELIXIR
Introduction to the BioJS project
Introduction to the BioJS projectIntroduction to the BioJS project
Introduction to the BioJS project
Rafael C. Jimenez
 

More from Rafael C. Jimenez (20)

BMB Resource Integration Workshop
BMB Resource Integration WorkshopBMB Resource Integration Workshop
BMB Resource Integration Workshop
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
Summary of Technical Coordinators discussions
Summary of Technical Coordinators discussionsSummary of Technical Coordinators discussions
Summary of Technical Coordinators discussions
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...
 
Standardisation in BMS European infrastructures
Standardisation in BMS European infrastructuresStandardisation in BMS European infrastructures
Standardisation in BMS European infrastructures
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
Standards
StandardsStandards
Standards
 
ELIXIR TCG update
ELIXIR TCG updateELIXIR TCG update
ELIXIR TCG update
 
An introduction to programmatic access
An introduction to programmatic accessAn introduction to programmatic access
An introduction to programmatic access
 
Life science requirements from e-infrastructure: initial results from a joint...
Life science requirements from e-infrastructure:initial results from a joint...Life science requirements from e-infrastructure:initial results from a joint...
Life science requirements from e-infrastructure: initial results from a joint...
 
Technical activities in ELIXIR Europe
Technical activities in ELIXIR EuropeTechnical activities in ELIXIR Europe
Technical activities in ELIXIR Europe
 
Challenges of big data. Summary day 1.
Challenges of big data. Summary day 1.Challenges of big data. Summary day 1.
Challenges of big data. Summary day 1.
 
Challenges of big data. Aims of the workshop.
Challenges of big data. Aims of the workshop.Challenges of big data. Aims of the workshop.
Challenges of big data. Aims of the workshop.
 
ELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciencesELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciences
 
SASI, A lightweight standard for exchanging course information
SASI, A lightweight standard for exchanging course informationSASI, A lightweight standard for exchanging course information
SASI, A lightweight standard for exchanging course information
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
Introduction to the BioJS project
Introduction to the BioJS projectIntroduction to the BioJS project
Introduction to the BioJS project
 

Recently uploaded

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 

Recently uploaded (20)

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 

Data submissions and archiving raw data in life sciences. A pilot with Proteomics Data

  • 1. European Life Sciences Infrastructure for Biological Information www.elixir-europe.org a Pilot with Proteomics Data Rafael C Jimenez ELIXIR CTO ELIXIR HoN &TCG meeting, 7 May 2014 Update Data submissions and archiving raw data in life sciences
  • 3. Data examples 3 Raw data Process data Metadata DNA Human Liver Mitochondria W. Smith … Peptide Mouse Heart Nucleus J. Heinz … LPISASHSSK… TTGTTATCCG… … … …
  • 4. Data submissions 4 Submissions raw data processed data metadata Data repository Search Integration
  • 5. Proteomics data in PRIDE 5 ~85% raw data
  • 6. Delegation of raw data 6 processed data metadata Data repository PID Submissions Search Integration
  • 7. ProteomeCentral Metadata / Manuscript Raw Data* Results Journals UniProt/ NeXtProt Peptide Atlas Other DBs Receiving repositories PASSEL (SRM data) PRIDE (MS/MS data) GPMDB Researcher’s results Reprocessed results Raw data* Metadata ProteomeXchange Vizcaíno et al., Nature Biotechnology, 2014 • Framework to enable standard data submission and dissemination pipelines between the main existing proteomics resources.
  • 8. ProteomeCentral Metadata / Manuscript Raw Data* Results Journals UniProt/ NeXtProt Peptide Atlas Other DBs Receiving repositories GPMDB Researcher’s results Reprocessed results Raw data* Metadata ProteomeXchange Vizcaíno et al., Nature Biotechnology, 2014 Raw Data* Pilot … PASSEL (SRM data) PRIDE (MS/MS data)
  • 9. ProteomeCentral Metadata / Manuscript Raw Data* Results Journals UniProt/ NeXtProt Peptide Atlas Other DBs Receiving repositories GPMDB Researcher’s results Reprocessed results Raw data* Metadata ProteomeXchange Vizcaíno et al., Nature Biotechnology, 2014 Raw Data* PRIDE (MS/MS data) PASSEL (SRM data) Option …
  • 10. European LifeSciences Infrastructure for Biological Information www.elixir-europe.org Thank you for your attention
  • 11. 11 Preparing for the data deluge in life sciences BioMedBridges workshop 15-16 May 2014 Wellcome Trust Genome Campus Hinxton, UK Request attendance to Stephanie Suhr <ssuhr@ebi.ac.uk>
  • 12. Important points to explore • EUDAT programmatic access for data submission • Synchronization for data updates • Keep unpublished datasets private until publication • Keep up with the growth (40TB, doubling every 4 months) • Use HPC facilities thought EUDAT to process raw data • Example for other repositories in life sciences 12
  • 13. Research infrastructures 13 Data ICT e-infrastructures LS life sciences Facilitate research Physical facilities Scientific information Communication Computation Storage
  • 14. Noble WS, MacCoss MJ (2012) Computational and Statistical Analysis of Protein Mass Spectrometry Data. PLoS Comput Biol 8(1): e1002296. doi:10.1371/journal.pcbi.1002296 http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002296 Overview of shotgun proteomics data production MKKKNIYSIRKLGVG IASVTLGTLLISG GVTPAANAAQHD FYQVLNMPNLNADQ RNGFIQSLK DDPSQSANVKLN 4 Peptide sequences
  • 15. 15 Martens et al., Proteomics, 2005 Vizcaíno et al., NAR, 2013 PRIDE (PRoteomics IDEntifications) database Mass spectrometry
  • 16. Origin: 152 USA 108 Germany 67 United Kingdom 53 Switzerland 48 Netherlands 42 China 42 Canada 41 France 36 Spain 33 Belgium 25 Australia 23 Sweden 17 Japan 16 Denmark 13 Norway 12 Finland 12 India 12 Taiwan 10 Italy 9 Republic of Korea 8 Austria 8 Ireland 8 Brazil 7 Singapore 5 Israel 5 Russia … 821 submitted datasets by April 1st 2014 Type: 273 PRIDE complete 501 PRIDE partial 47 PeptideAtlas/PASSEL complete Access: 38.3% PRIDE public 5.3% PASSEL public 56% PRIDE private 0.4% PASSEL private Data volume: Total: >40 TB Number of all files: >120,000 PXD000320-324: ~ 5 TB PXD000065: ~ 1.4TB Top Species studied by at least 8 datasets: 381 Homo sapiens 100 Mus musculus 31 Arabidopsis thaliana 26 Saccharomyces cerevisiae 16 Escherichia coli 14 Rattus norvegicus 12 Mycobacterium tuberculosis 11 Drosophila melanogaster ~ 215 species in total Submissions/year: 2012: 102 2013: 527 2014: 192
  • 17. Acknowledgements PeptideAtlas Team Eric Deutsch Terry Farrah Zhi Sun Andrew R. Jones Lennart Martens Juan Pablo Albar Martin Eisenacher Gil Omenn And many other PX partners and stakeholders PRIDE team Juan Antonio Vizcaino Henning Hermjakob Attila Csordas Rui Wang Florian Reisinger Tobias Ternent Noemi del Toro Jose A. Dianes Yasset Perez-Riverol Qing-Wei Xu Ilias Lavidas

Editor's Notes

  1. Proteomics is the large-scale study of proteins, particularly their structures and functions Mass spectrometry (MS) is an analytical technique that measures the mass-to-charge (m/z) ratio of charged particles.