Luciferase in rDNA technology (biotechnology).pptx
How to run and maintain a popular biological data repository?
1. How to run and maintain a popular
biological data repository?
Dr. Juan Antonio Vizcaíno
EMBL-European Bioinformatics Institute
Hinxton, Cambridge, UK
E-mail: juan@ebi.ac.uk
2. Juan A. Vizcaíno
juan@ebi.ac.uk
Danish Bioinformatics Conference
Odense, 23 August 2017
Data resources at EMBL-EBI
Genes, genomes & variation
ArrayExpress
Expression Atlas PRIDE
InterPro Pfam UniProt
ChEMBL ChEBI
Molecular structures
Protein Data Bank in Europe
Electron Microscopy Data Bank
European Nucleotide Archive
European Variation Archive
European Genome-phenome Archive
Gene & protein expression
Protein sequences, families & motifs
Chemical biology
Reactions, interactions &
pathways
IntAct Reactome MetaboLights
Systems
BioModels Enzyme Portal BioSamples
Ensembl
Ensembl Genomes
GWAS Catalog
Metagenomics portal
Europe PubMed Central
Gene Ontology
Experimental Factor
Ontology
Literature & ontologies
3. Juan A. Vizcaíno
juan@ebi.ac.uk
Danish Bioinformatics Conference
Odense, 23 August 2017
• PRIDE stores mass spectrometry (MS)-based
proteomics data:
• Peptide and protein expression data
(identification and quantification)
• Post-translational modifications
• Mass spectra (raw data and peak lists)
• Technical and biological metadata
• Any other related information
• Full support for tandem MS approaches
• Any type of data can be stored.
• Since July 2017, an ELIXIR core resource.
PRIDE (PRoteomics IDEntifications) Archive
http://www.ebi.ac.uk/pride/archive
Martens et al., Proteomics, 2005
Vizcaíno et al., NAR, 2016
5. Juan A. Vizcaíno
juan@ebi.ac.uk
Danish Bioinformatics Conference
Odense, 23 August 2017
Infrastructure and personnel are the key (1)
Computing infrastructure is very expensive to maintain.
(We are very lucky to be at the EBI).
7. Juan A. Vizcaíno
juan@ebi.ac.uk
Danish Bioinformatics Conference
Odense, 23 August 2017
Infrastructure and personnel are the key (2)
Expert staff members are essential (like in any other business):
- It is not easy to get good people with the needed skills
- Complementary skills are essential
- Managing people as a team is challenging
8. Juan A. Vizcaíno
juan@ebi.ac.uk
Danish Bioinformatics Conference
Odense, 23 August 2017
At an early stage
• Promote your resource as much as possible and demonstrate that you
can provide an essential service in a reliable way
• Interaction with scientific journals was essential for PRIDE.
• Scientists are busy people
• Journals and funders requirements are pushing for open data
• Develop user-friendly tools. Submitting and accessing data must be very
easy.
• Demonstrate that the team does and enables good science
• Publications, conferences/workshops.
9. Juan A. Vizcaíno
juan@ebi.ac.uk
Danish Bioinformatics Conference
Odense, 23 August 2017
At a more advanced stage
• Keep active communication with some wet labs, and with
the community as a whole (finding the right balance)
• New ideas come often from a simple chat
• Conferences and workshops
• Training courses
• Work actively together with your collaborators/competitors
to look for synergies:
• ProteomeXchange
10. Juan A. Vizcaíno
juan@ebi.ac.uk
Danish Bioinformatics Conference
Odense, 23 August 2017
ProteomeXchange: A Global, distributed proteomics
database
PASSEL
(SRM data)
PRIDE
(MS/MS data)
MassIVE
(MS/MS data)
Raw
ID/Q
Meta
jPOST
(MS/MS data)
Mandatory raw data deposition
since July 2015
• Goal: Development of a framework to allow standard data submission and
dissemination pipelines between the main existing proteomics repositories.
http://www.proteomexchange.org
New in 2016
Vizcaíno et al., Nat Biotechnol, 2014
Deutsch et al., NAR, 2017
11. Juan A. Vizcaíno
juan@ebi.ac.uk
Danish Bioinformatics Conference
Odense, 23 August 2017
At a more advanced stage
• Keep active communication with some wet labs, and with
the community as a whole (finding the right balance)
• New ideas come often from a simple chat
• Conferences and workshops
• Training courses
• Work actively together with your collaborators/competitors
to look for synergies:
• ProteomeXchange
• An International Consortium is better than a single resource
(perception, funding, etc)
12. Juan A. Vizcaíno
juan@ebi.ac.uk
Danish Bioinformatics Conference
Odense, 23 August 2017
Running PRIDE every day -> Sustainability
• Keep our clients happy!
• (People usually only says something when something is not working!)
• Make things as easy as possible in terms of software
development:
• Long-term maintainability is key
• Good software practises, e.g. documentation
• Community-based data standards are essential
14. Juan A. Vizcaíno
juan@ebi.ac.uk
Danish Bioinformatics Conference
Odense, 23 August 2017
•Develops data standards for proteomics
•Both data representation and annotation standards
•Involves data producers, database providers, software producers,
publishers, everyone who wants to be involved…
•Active Workgroups: MI, MS, PI, Mod and the new QC
•Inter-group activities: MIAPE and Controlled Vocabularies
•Started in 2002, so some experience already…
•One annual meeting in March-April, regular phone calls
•Close interaction with the metabolomics community (MSI)
http://www.psidev.info
HUPO Proteomics Standards Initiative
15. Juan A. Vizcaíno
juan@ebi.ac.uk
Danish Bioinformatics Conference
Odense, 23 August 2017
Running PRIDE every day -> Sustainability
• Keep our clients happy!
• (People usually only say something when something is not working!)
• Make things as easy as possible in terms of software
development:
• Long-term maintainability is key
• Good software practises, e.g. documentation
• Community-based data standards are essential.
• Open source tools, open data.
• Find the right balance between maintenance/doing new things.
16. Juan A. Vizcaíno
juan@ebi.ac.uk
Danish Bioinformatics Conference
Odense, 23 August 2017
Running PRIDE every day -> Sustainability
(2)• Running a resource like PRIDE is expensive:
• Constantly looking for funding
• Take opportunities to collaborate, looking for synergies:
• a) with other resources:
• UniProt
• Genomics resources (e.g. Ensembl)
• Metabolomics resources
• b) with other researchers in the field.
17. Juan A. Vizcaíno
juan@ebi.ac.uk
Danish Bioinformatics Conference
Odense, 23 August 2017
Aknowledgements: People
Attila Csordas
Tobias Ternent
Mathias Walzer
Gerhard Mayer (de.NBI)
Johannes Griss
Yasset Perez-Riverol
Manuel Bernal-Llinares
Andrew Jarnuczak
Former team members, especially
Rui Wang, Florian Reisinger, Noemi
del Toro, Jose A. Dianes & Henning
Hermjakob
Acknowledgements: The PRIDE Team
All data submitters !!!
@pride_ebi
@proteomexchange