SlideShare a Scribd company logo
European Life Sciences Infrastructure for Biological Information
www.elixir-europe.org
Rafael C Jimenez
ELIXIR CTO
EDUAT conference, 25 September 2014
Proteomics repositories integration
using EUDAT resources
Data submissions
2
Submissions
raw data
processed data
metadata
Data
repository
Search
Integration
Noble WS, MacCoss MJ (2012) Computational and Statistical Analysis of Protein Mass Spectrometry Data. PLoS Comput Biol 8(1):
e1002296. doi:10.1371/journal.pcbi.1002296
http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002296
Overview of shotgun proteomics data production
MKKKNIYSIRKLGVG
IASVTLGTLLISG
GVTPAANAAQHD
FYQVLNMPNLNADQ
RNGFIQSLK
DDPSQSANVKLN
4
Peptide sequences
Raw data Process data
Metadata
Data examples
4
Raw data Process data Metadata
DNA
Human
Liver
Mitochondria
W. Smith
…
Peptide
Mouse
Heart
Nucleus
J. Heinz
…
LPISASHSSK…
TTGTTATCCG…
… … …
Proteomics data in PRIDE
5
~85% raw data
ProteomeCentral
Metadata /
Manuscript
Raw Data*
Results
Journals
UniProt/
NeXtProt
Peptide Atlas
Other DBs
Receiving repositories
PASSEL
(SRM data)
PRIDE
(MS/MS data)
GPMDB
Researcher’s results Reprocessed results Raw data* Metadata
ProteomeXchange
Vizcaíno et al., Nature Biotechnology, 2014
• Framework to enable standard data submission and
dissemination pipelines between the main existing
proteomics resources.
7 Martens et al., Proteomics, 2005
Vizcaíno et al., NAR, 2013
PRIDE (PRoteomics IDEntifications) database
Mass spectrometry
Origin:
152 USA
108 Germany
67 United Kingdom
53 Switzerland
48 Netherlands
42 China
42 Canada
41 France
36 Spain
33 Belgium
25 Australia
23 Sweden
17 Japan
16 Denmark
13 Norway
12 Finland
12 India
12 Taiwan
10 Italy
9 Republic of Korea
8 Austria
8 Ireland
8 Brazil
7 Singapore
5 Israel
5 Russia …
Type:
273 PRIDE complete
501 PRIDE partial
47 PeptideAtlas/PASSEL complete
Access:
38.3% PRIDE public
5.3% PASSEL public
56% PRIDE private
0.4% PASSEL private
Data volume:
Total: >40 TB
Number of all files: >120,000
PXD000320-324: ~ 5 TB
PXD000065: ~ 1.4TB
Top Species studied by at least 8
datasets:
381 Homo sapiens
100 Mus musculus
31 Arabidopsis thaliana
26 Saccharomyces cerevisiae
16 Escherichia coli
14 Rattus norvegicus
12 Mycobacterium tuberculosis
11 Drosophila melanogaster
~ 215 species in total
Submissions/year:
2012: 102
2013: 527
2014: 192
Pilot evolution
• Use EUDAT
• Replication of ELIXIR data in EUDAT data centers
• Delegation of ELIXIR data in EUDAT data centers
• Adopt EUDAT
• Replication of ELIXIR data in ELIXIR data centers using EUDAT
technology
9
Replication of ELIXIR data in EUDAT data
centers
10
Central repository Data storage centers
Meta
data
Raw
Data
Meta
data
Results
Raw
Data
ProteomeCentral
Metadata /
Manuscript
Raw Data*
Results
Journals
UniProt/
NeXtProt
Peptide Atlas
Other DBs
Receiving repositories
GPMDB
Researcher’s results Reprocessed results Raw data* Metadata
Vizcaíno et al., Nature Biotechnology, 2014
Raw Data*
PASSEL
(SRM data)
PRIDE
(MS/MS data)
Replication of ELIXIR data in EUDAT data
centers
Delegation of ELIXIR data in EUDAT data
centers
12
Central repository Data storage centers
Meta
data
Raw
Data
Meta
data
Results
Raw
Data
ProteomeCentral
Metadata /
Manuscript
Raw Data*
Results
Journals
UniProt/
NeXtProt
Peptide Atlas
Other DBs
Receiving repositories
GPMDB
Researcher’s results Reprocessed results Raw data* Metadata
Vizcaíno et al., Nature Biotechnology, 2014
Raw Data*
PRIDE
(MS/MS data)
PASSEL
(SRM data)
Delegation of ELIXIR data in EUDAT data
centers
Replication of ELIXIR data in ELIXIR data centers
using EUDAT technology
14
National proteomics centers
Meta
data
Results
Raw
Data
Central repository
Meta
data
Results
Raw
Data
Plans
15
National proteomics centers
Meta
data
Results
Raw
Data
Central repository
Meta
data
Results
Raw
Data
Data storage centers
Meta
data
Raw
Data
1.- ELXIR replication
2.- EUDAT replication
Plans
16
National proteomics centers
Meta
data
Results
Raw
Data
Central repository
Meta
data
Results
Raw
Data
Data storage centers
Meta
data
Raw
Data
3.- delegation
ELIXIR Pilot action
17
EUDAT services
18
File sharing model
19
CSC
BILS
Site B
Site C
EUDAT CDIELIXIR
B2SAFE
B2SAFE
B2SAFE
B2SAFE
PRIDE
EMBL-EBI
Pilot – EUDAT adoption: ELIXIR replication
20
CSC
BILS
Site B
Site C
EUDAT CDIELIXIR
B2SAFE
B2SAFE
B2SAFE
B2SAFE
PRIDE
EMBL-EBI
Central repositoryNational proteomics centers
Meta
data
Results
Raw
Data
Meta
data
Results
Raw
Data
PIDs
21
ELIXIR
community center
ELIXIR
Data center 1
EUDAT
Data center 1
CSCPRIDEBILS
Status
• BILS
• Migrating from existing Swestore dCache to iRODS
• Testing compatibility with B2SAFE
• Latest iRDOS not compatible with B2SAFE?
• PRIDE
• iRODS service installed
• B2SAFE module have been deployed at EMBL-EBI (PRIDE)
• Test B2SAFE replication PRIDE -> CSC
• DOI for datasets
• PID for dataset files
• Web service to associate datasets to dataset files
22
Status
In progress
• Handle System Registration
• Test requests of EPIC/EUDAT identifiers
Open questions
• BILS local PIDs?
• Sync back from PRIDE to BILS for modifications/additions at PRIDE?
• Data push or pull model?
• Replication of process data requires previous validation
23
Participants
EUDAT/CSC
• Jani Heikkinen
• Damien Lecarpentier
• Johannes Reetz
EMBL-EBI/systems
• Andy Jenkinson
• Steven Newhouse
24
BILS
• Mikael Borg
• Fredrik Levander
• Bengt Persson
EMBL-EBI/PRIDE
• JuanAntonioVizcaíno
• RuiWang
• Henning Hermjakob
ELIXIR Hub
• Rafael C Jimenez
European LifeSciences Infrastructure for Biological Information
www.elixir-europe.org
Thank you for your attention
Delegation of raw data
26
processed data
metadata
Data
repository
PID
Submissions
Search
Integration
27
National proteomics centers
Meta
data
Results
Raw
Data
Central repository
Meta
data
Results
Raw
Data
Data storage centers
Meta
data
Raw
Data
National proteomics centers
Meta
data
Results
Raw
Data
Central repository
Meta
data
Results
Raw
Data
Data storage centers
Meta
data
Raw
Data

More Related Content

What's hot

Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...
Elufer Akram
 
Biological Databases
Biological DatabasesBiological Databases
Biological Databases
Shweta Kagliwal
 
Pathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformaticsPathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformatics
sarwat bashir
 
Intro bioinfo
Intro bioinfoIntro bioinfo
Intro bioinfo
Vinitha Nair
 
Bioinformatics principles and applications
Bioinformatics principles and applicationsBioinformatics principles and applications
Features of biological databases
Features of biological databasesFeatures of biological databases
Features of biological databases
Charu Sharma
 
Bioinformatica 06-10-2011-t2-databases
Bioinformatica 06-10-2011-t2-databasesBioinformatica 06-10-2011-t2-databases
Bioinformatica 06-10-2011-t2-databases
Prof. Wim Van Criekinge
 
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...
dkNET
 
Databases in Bioinformatics
Databases in BioinformaticsDatabases in Bioinformatics
Databases in Bioinformatics
Meghaj Mallick
 
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
ExternalEvents
 
Ijetcas14 325
Ijetcas14 325Ijetcas14 325
Ijetcas14 325
Iasir Journals
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformatics
nadeem akhter
 
Major resources of bioinformatics 2
Major resources of bioinformatics 2Major resources of bioinformatics 2
Major resources of bioinformatics 2
Mohd Affan
 
Major databases in bioinformatics
Major databases in bioinformaticsMajor databases in bioinformatics
Major databases in bioinformatics
Vidya Kalaivani Rajkumar
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
KAUSHAL SAHU
 
The uni prot knowledgebase
The uni prot knowledgebaseThe uni prot knowledgebase
The uni prot knowledgebase
Kew Sama
 
Ddbj
DdbjDdbj
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBI
geetikaJethra
 
E-Utilities
E-UtilitiesE-Utilities
E-Utilities
mkim8
 
Protein Data Bank (PDB)
Protein Data Bank (PDB)Protein Data Bank (PDB)

What's hot (20)

Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...
 
Biological Databases
Biological DatabasesBiological Databases
Biological Databases
 
Pathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformaticsPathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformatics
 
Intro bioinfo
Intro bioinfoIntro bioinfo
Intro bioinfo
 
Bioinformatics principles and applications
Bioinformatics principles and applicationsBioinformatics principles and applications
Bioinformatics principles and applications
 
Features of biological databases
Features of biological databasesFeatures of biological databases
Features of biological databases
 
Bioinformatica 06-10-2011-t2-databases
Bioinformatica 06-10-2011-t2-databasesBioinformatica 06-10-2011-t2-databases
Bioinformatica 06-10-2011-t2-databases
 
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...
 
Databases in Bioinformatics
Databases in BioinformaticsDatabases in Bioinformatics
Databases in Bioinformatics
 
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
 
Ijetcas14 325
Ijetcas14 325Ijetcas14 325
Ijetcas14 325
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformatics
 
Major resources of bioinformatics 2
Major resources of bioinformatics 2Major resources of bioinformatics 2
Major resources of bioinformatics 2
 
Major databases in bioinformatics
Major databases in bioinformaticsMajor databases in bioinformatics
Major databases in bioinformatics
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
 
The uni prot knowledgebase
The uni prot knowledgebaseThe uni prot knowledgebase
The uni prot knowledgebase
 
Ddbj
DdbjDdbj
Ddbj
 
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBI
 
E-Utilities
E-UtilitiesE-Utilities
E-Utilities
 
Protein Data Bank (PDB)
Protein Data Bank (PDB)Protein Data Bank (PDB)
Protein Data Bank (PDB)
 

Similar to Proteomics repositories integration using EUDAT resources

ProteomeXchange: data deposition and data retrieval made easy
ProteomeXchange: data deposition and data retrieval made easyProteomeXchange: data deposition and data retrieval made easy
ProteomeXchange: data deposition and data retrieval made easy
Juan Antonio Vizcaino
 
PRIDE-ProteomeXchange
PRIDE-ProteomeXchangePRIDE-ProteomeXchange
PRIDE-ProteomeXchange
Juan Antonio Vizcaino
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
Anshika Bansal
 
protein databases
 protein databases protein databases
protein databases
wasisyed
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
David Peyruc
 
Jax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbuJax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbu
Anne Deslattes Mays
 
EnVisioning Pathways
EnVisioning PathwaysEnVisioning Pathways
EnVisioning Pathways
Rafael C. Jimenez
 
NIH-mar2604.rm.ppt
NIH-mar2604.rm.pptNIH-mar2604.rm.ppt
NIH-mar2604.rm.ppt
Chandrakanth R
 
DAS game: how a programmer thinks
DAS game: how a programmer thinksDAS game: how a programmer thinks
DAS game: how a programmer thinks
Rafael C. Jimenez
 
BioData World Basel 2018
BioData World Basel 2018BioData World Basel 2018
BioData World Basel 2018
Anne Deslattes Mays
 
proteomics.ppt
proteomics.pptproteomics.ppt
proteomics.ppt
MANJUSINGH948460
 
Quantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To BrowserQuantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To Browser
Neil Swainston
 
Pride and ProteomeXchange
Pride and ProteomeXchangePride and ProteomeXchange
Pride and ProteomeXchange
Juan Antonio Vizcaino
 
User-friendly bioinformatics (Monthly Informational workshop)
User-friendly bioinformatics (Monthly Informational workshop)User-friendly bioinformatics (Monthly Informational workshop)
User-friendly bioinformatics (Monthly Informational workshop)
Elia Brodsky
 
ProteomeXchange_and_PRIDE_Semmeting_2015
ProteomeXchange_and_PRIDE_Semmeting_2015ProteomeXchange_and_PRIDE_Semmeting_2015
ProteomeXchange_and_PRIDE_Semmeting_2015
Juan Antonio Vizcaino
 
Grafström - Lush Prize Conference 2014
Grafström - Lush Prize Conference 2014Grafström - Lush Prize Conference 2014
Grafström - Lush Prize Conference 2014
LushPrize
 
Biological database
Biological databaseBiological database
Biological database
Iqbal college Peringammala TVM
 
Biodatabases 101220022654-phpapp02
Biodatabases 101220022654-phpapp02Biodatabases 101220022654-phpapp02
Biodatabases 101220022654-phpapp02
Sreekanth Gali
 
MS Imaging data in ProteomeXchange (HUPO 2014)
MS Imaging data in ProteomeXchange (HUPO 2014)MS Imaging data in ProteomeXchange (HUPO 2014)
MS Imaging data in ProteomeXchange (HUPO 2014)
Juan Antonio Vizcaino
 
ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...
ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...
ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...
Juan Antonio Vizcaino
 

Similar to Proteomics repositories integration using EUDAT resources (20)

ProteomeXchange: data deposition and data retrieval made easy
ProteomeXchange: data deposition and data retrieval made easyProteomeXchange: data deposition and data retrieval made easy
ProteomeXchange: data deposition and data retrieval made easy
 
PRIDE-ProteomeXchange
PRIDE-ProteomeXchangePRIDE-ProteomeXchange
PRIDE-ProteomeXchange
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
protein databases
 protein databases protein databases
protein databases
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
 
Jax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbuJax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbu
 
EnVisioning Pathways
EnVisioning PathwaysEnVisioning Pathways
EnVisioning Pathways
 
NIH-mar2604.rm.ppt
NIH-mar2604.rm.pptNIH-mar2604.rm.ppt
NIH-mar2604.rm.ppt
 
DAS game: how a programmer thinks
DAS game: how a programmer thinksDAS game: how a programmer thinks
DAS game: how a programmer thinks
 
BioData World Basel 2018
BioData World Basel 2018BioData World Basel 2018
BioData World Basel 2018
 
proteomics.ppt
proteomics.pptproteomics.ppt
proteomics.ppt
 
Quantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To BrowserQuantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To Browser
 
Pride and ProteomeXchange
Pride and ProteomeXchangePride and ProteomeXchange
Pride and ProteomeXchange
 
User-friendly bioinformatics (Monthly Informational workshop)
User-friendly bioinformatics (Monthly Informational workshop)User-friendly bioinformatics (Monthly Informational workshop)
User-friendly bioinformatics (Monthly Informational workshop)
 
ProteomeXchange_and_PRIDE_Semmeting_2015
ProteomeXchange_and_PRIDE_Semmeting_2015ProteomeXchange_and_PRIDE_Semmeting_2015
ProteomeXchange_and_PRIDE_Semmeting_2015
 
Grafström - Lush Prize Conference 2014
Grafström - Lush Prize Conference 2014Grafström - Lush Prize Conference 2014
Grafström - Lush Prize Conference 2014
 
Biological database
Biological databaseBiological database
Biological database
 
Biodatabases 101220022654-phpapp02
Biodatabases 101220022654-phpapp02Biodatabases 101220022654-phpapp02
Biodatabases 101220022654-phpapp02
 
MS Imaging data in ProteomeXchange (HUPO 2014)
MS Imaging data in ProteomeXchange (HUPO 2014)MS Imaging data in ProteomeXchange (HUPO 2014)
MS Imaging data in ProteomeXchange (HUPO 2014)
 
ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...
ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...
ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...
 

More from Rafael C. Jimenez

BMB Resource Integration Workshop
BMB Resource Integration WorkshopBMB Resource Integration Workshop
BMB Resource Integration Workshop
Rafael C. Jimenez
 
ELIXIR
ELIXIRELIXIR
ELIXIR
ELIXIRELIXIR
Summary of Technical Coordinators discussions
Summary of Technical Coordinators discussionsSummary of Technical Coordinators discussions
Summary of Technical Coordinators discussions
Rafael C. Jimenez
 
ELIXIR
ELIXIRELIXIR
The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...
Rafael C. Jimenez
 
Standardisation in BMS European infrastructures
Standardisation in BMS European infrastructuresStandardisation in BMS European infrastructures
Standardisation in BMS European infrastructures
Rafael C. Jimenez
 
ELIXIR
ELIXIRELIXIR
ELIXIR
ELIXIRELIXIR
Standards
StandardsStandards
ELIXIR TCG update
ELIXIR TCG updateELIXIR TCG update
ELIXIR TCG update
Rafael C. Jimenez
 
An introduction to programmatic access
An introduction to programmatic accessAn introduction to programmatic access
An introduction to programmatic access
Rafael C. Jimenez
 
Life science requirements from e-infrastructure: initial results from a joint...
Life science requirements from e-infrastructure:initial results from a joint...Life science requirements from e-infrastructure:initial results from a joint...
Life science requirements from e-infrastructure: initial results from a joint...
Rafael C. Jimenez
 
Technical activities in ELIXIR Europe
Technical activities in ELIXIR EuropeTechnical activities in ELIXIR Europe
Technical activities in ELIXIR Europe
Rafael C. Jimenez
 
Challenges of big data. Summary day 1.
Challenges of big data. Summary day 1.Challenges of big data. Summary day 1.
Challenges of big data. Summary day 1.
Rafael C. Jimenez
 
Challenges of big data. Aims of the workshop.
Challenges of big data. Aims of the workshop.Challenges of big data. Aims of the workshop.
Challenges of big data. Aims of the workshop.
Rafael C. Jimenez
 
ELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciencesELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciences
Rafael C. Jimenez
 
SASI, A lightweight standard for exchanging course information
SASI, A lightweight standard for exchanging course informationSASI, A lightweight standard for exchanging course information
SASI, A lightweight standard for exchanging course information
Rafael C. Jimenez
 
ELIXIR
ELIXIRELIXIR
Introduction to the BioJS project
Introduction to the BioJS projectIntroduction to the BioJS project
Introduction to the BioJS project
Rafael C. Jimenez
 

More from Rafael C. Jimenez (20)

BMB Resource Integration Workshop
BMB Resource Integration WorkshopBMB Resource Integration Workshop
BMB Resource Integration Workshop
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
Summary of Technical Coordinators discussions
Summary of Technical Coordinators discussionsSummary of Technical Coordinators discussions
Summary of Technical Coordinators discussions
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...
 
Standardisation in BMS European infrastructures
Standardisation in BMS European infrastructuresStandardisation in BMS European infrastructures
Standardisation in BMS European infrastructures
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
Standards
StandardsStandards
Standards
 
ELIXIR TCG update
ELIXIR TCG updateELIXIR TCG update
ELIXIR TCG update
 
An introduction to programmatic access
An introduction to programmatic accessAn introduction to programmatic access
An introduction to programmatic access
 
Life science requirements from e-infrastructure: initial results from a joint...
Life science requirements from e-infrastructure:initial results from a joint...Life science requirements from e-infrastructure:initial results from a joint...
Life science requirements from e-infrastructure: initial results from a joint...
 
Technical activities in ELIXIR Europe
Technical activities in ELIXIR EuropeTechnical activities in ELIXIR Europe
Technical activities in ELIXIR Europe
 
Challenges of big data. Summary day 1.
Challenges of big data. Summary day 1.Challenges of big data. Summary day 1.
Challenges of big data. Summary day 1.
 
Challenges of big data. Aims of the workshop.
Challenges of big data. Aims of the workshop.Challenges of big data. Aims of the workshop.
Challenges of big data. Aims of the workshop.
 
ELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciencesELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciences
 
SASI, A lightweight standard for exchanging course information
SASI, A lightweight standard for exchanging course informationSASI, A lightweight standard for exchanging course information
SASI, A lightweight standard for exchanging course information
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
Introduction to the BioJS project
Introduction to the BioJS projectIntroduction to the BioJS project
Introduction to the BioJS project
 

Recently uploaded

Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 

Recently uploaded (20)

Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 

Proteomics repositories integration using EUDAT resources

  • 1. European Life Sciences Infrastructure for Biological Information www.elixir-europe.org Rafael C Jimenez ELIXIR CTO EDUAT conference, 25 September 2014 Proteomics repositories integration using EUDAT resources
  • 2. Data submissions 2 Submissions raw data processed data metadata Data repository Search Integration
  • 3. Noble WS, MacCoss MJ (2012) Computational and Statistical Analysis of Protein Mass Spectrometry Data. PLoS Comput Biol 8(1): e1002296. doi:10.1371/journal.pcbi.1002296 http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002296 Overview of shotgun proteomics data production MKKKNIYSIRKLGVG IASVTLGTLLISG GVTPAANAAQHD FYQVLNMPNLNADQ RNGFIQSLK DDPSQSANVKLN 4 Peptide sequences Raw data Process data Metadata
  • 4. Data examples 4 Raw data Process data Metadata DNA Human Liver Mitochondria W. Smith … Peptide Mouse Heart Nucleus J. Heinz … LPISASHSSK… TTGTTATCCG… … … …
  • 5. Proteomics data in PRIDE 5 ~85% raw data
  • 6. ProteomeCentral Metadata / Manuscript Raw Data* Results Journals UniProt/ NeXtProt Peptide Atlas Other DBs Receiving repositories PASSEL (SRM data) PRIDE (MS/MS data) GPMDB Researcher’s results Reprocessed results Raw data* Metadata ProteomeXchange Vizcaíno et al., Nature Biotechnology, 2014 • Framework to enable standard data submission and dissemination pipelines between the main existing proteomics resources.
  • 7. 7 Martens et al., Proteomics, 2005 Vizcaíno et al., NAR, 2013 PRIDE (PRoteomics IDEntifications) database Mass spectrometry
  • 8. Origin: 152 USA 108 Germany 67 United Kingdom 53 Switzerland 48 Netherlands 42 China 42 Canada 41 France 36 Spain 33 Belgium 25 Australia 23 Sweden 17 Japan 16 Denmark 13 Norway 12 Finland 12 India 12 Taiwan 10 Italy 9 Republic of Korea 8 Austria 8 Ireland 8 Brazil 7 Singapore 5 Israel 5 Russia … Type: 273 PRIDE complete 501 PRIDE partial 47 PeptideAtlas/PASSEL complete Access: 38.3% PRIDE public 5.3% PASSEL public 56% PRIDE private 0.4% PASSEL private Data volume: Total: >40 TB Number of all files: >120,000 PXD000320-324: ~ 5 TB PXD000065: ~ 1.4TB Top Species studied by at least 8 datasets: 381 Homo sapiens 100 Mus musculus 31 Arabidopsis thaliana 26 Saccharomyces cerevisiae 16 Escherichia coli 14 Rattus norvegicus 12 Mycobacterium tuberculosis 11 Drosophila melanogaster ~ 215 species in total Submissions/year: 2012: 102 2013: 527 2014: 192
  • 9. Pilot evolution • Use EUDAT • Replication of ELIXIR data in EUDAT data centers • Delegation of ELIXIR data in EUDAT data centers • Adopt EUDAT • Replication of ELIXIR data in ELIXIR data centers using EUDAT technology 9
  • 10. Replication of ELIXIR data in EUDAT data centers 10 Central repository Data storage centers Meta data Raw Data Meta data Results Raw Data
  • 11. ProteomeCentral Metadata / Manuscript Raw Data* Results Journals UniProt/ NeXtProt Peptide Atlas Other DBs Receiving repositories GPMDB Researcher’s results Reprocessed results Raw data* Metadata Vizcaíno et al., Nature Biotechnology, 2014 Raw Data* PASSEL (SRM data) PRIDE (MS/MS data) Replication of ELIXIR data in EUDAT data centers
  • 12. Delegation of ELIXIR data in EUDAT data centers 12 Central repository Data storage centers Meta data Raw Data Meta data Results Raw Data
  • 13. ProteomeCentral Metadata / Manuscript Raw Data* Results Journals UniProt/ NeXtProt Peptide Atlas Other DBs Receiving repositories GPMDB Researcher’s results Reprocessed results Raw data* Metadata Vizcaíno et al., Nature Biotechnology, 2014 Raw Data* PRIDE (MS/MS data) PASSEL (SRM data) Delegation of ELIXIR data in EUDAT data centers
  • 14. Replication of ELIXIR data in ELIXIR data centers using EUDAT technology 14 National proteomics centers Meta data Results Raw Data Central repository Meta data Results Raw Data
  • 15. Plans 15 National proteomics centers Meta data Results Raw Data Central repository Meta data Results Raw Data Data storage centers Meta data Raw Data 1.- ELXIR replication 2.- EUDAT replication
  • 16. Plans 16 National proteomics centers Meta data Results Raw Data Central repository Meta data Results Raw Data Data storage centers Meta data Raw Data 3.- delegation
  • 19. File sharing model 19 CSC BILS Site B Site C EUDAT CDIELIXIR B2SAFE B2SAFE B2SAFE B2SAFE PRIDE EMBL-EBI
  • 20. Pilot – EUDAT adoption: ELIXIR replication 20 CSC BILS Site B Site C EUDAT CDIELIXIR B2SAFE B2SAFE B2SAFE B2SAFE PRIDE EMBL-EBI Central repositoryNational proteomics centers Meta data Results Raw Data Meta data Results Raw Data
  • 21. PIDs 21 ELIXIR community center ELIXIR Data center 1 EUDAT Data center 1 CSCPRIDEBILS
  • 22. Status • BILS • Migrating from existing Swestore dCache to iRODS • Testing compatibility with B2SAFE • Latest iRDOS not compatible with B2SAFE? • PRIDE • iRODS service installed • B2SAFE module have been deployed at EMBL-EBI (PRIDE) • Test B2SAFE replication PRIDE -> CSC • DOI for datasets • PID for dataset files • Web service to associate datasets to dataset files 22
  • 23. Status In progress • Handle System Registration • Test requests of EPIC/EUDAT identifiers Open questions • BILS local PIDs? • Sync back from PRIDE to BILS for modifications/additions at PRIDE? • Data push or pull model? • Replication of process data requires previous validation 23
  • 24. Participants EUDAT/CSC • Jani Heikkinen • Damien Lecarpentier • Johannes Reetz EMBL-EBI/systems • Andy Jenkinson • Steven Newhouse 24 BILS • Mikael Borg • Fredrik Levander • Bengt Persson EMBL-EBI/PRIDE • JuanAntonioVizcaíno • RuiWang • Henning Hermjakob ELIXIR Hub • Rafael C Jimenez
  • 25. European LifeSciences Infrastructure for Biological Information www.elixir-europe.org Thank you for your attention
  • 26. Delegation of raw data 26 processed data metadata Data repository PID Submissions Search Integration
  • 27. 27 National proteomics centers Meta data Results Raw Data Central repository Meta data Results Raw Data Data storage centers Meta data Raw Data National proteomics centers Meta data Results Raw Data Central repository Meta data Results Raw Data Data storage centers Meta data Raw Data

Editor's Notes

  1. Proteomics is the large-scale study of proteins, particularly their structures and functions Mass spectrometry (MS) is an analytical technique that measures the mass-to-charge (m/z) ratio of charged particles.