SlideShare a Scribd company logo
1 of 19
Download to read offline
Yasset Perez-Riverol Ph.D
PRIDE Team
Systematic integration of millions of
peptidoform evidences into Ensembl and
other genome browsers.
Meeting Sanger-EBI team
EMBL-EBI, November 2017
PRIDE Proteogenomics
• Provides a Trackhub in ENSEMBL for every
ProteomeXchange COMPLETE submission in
ENSEMBL.
• Provides a global TrackHub in ENSEMBL for
all PRIDE peptide evidences.
Meeting Sanger-EBI team
EMBL-EBI, November 2017
PX Complete Trackhub in ENSEMBL
PX Submission
Tool
PRIDE Archive
1 2
PRIDE submission Pipelines
PRIDE Archive
Web and API
3
TrackHub
Registry
4
PX submission can be Partial or Complete:
Partial Submission: RAW data, SEARCH
Results and Peaks Lists.
Complete Submission: RAW data, Result
Files and Peak Lists, SEARCH Results.
1
Each PX submission can be search by:
Title, Metadata, Description, Tissue,
Taxonomy, PTMs.
Peptide Sequence or Protein Identifier.
3
TrackHub Registry can search Tracks by:
ShortLabel, LongLabel.
OmicsType: Proteomics, Genomics,
Transcriptomics.
4
PRIDE Submission Pipelines2
mzid
Peak
lists
PX Complete Submission
Assays
mztab
mgf
lists
PX Complete Submission
Assays
5
Convert to mztab/mgf and filter evidences do not
pass the reported mzid threshold.
5 Storage of the Project Metadata, Peptide Sequences,
Protein identifiers in Solr and MongoDB.
6
6
Assay
Peptide
Pogo File
PX Complete Submission
Taxonomies
Track
TrackHub Generation
Taxonomies
TrackHub
Registry
Meeting Sanger-EBI team
EMBL-EBI, November 2017
Generation reliable peptide tables
Current Filter options:
• 1% FDR PSM level (Combine
Results)
• 1% FDR Peptide Level (Combine
Results)
Possible Filters (HPP):
• > 8 AA
• 1% FDR at transcript level
(inference needed)
Combine PSM Score:
- Same Spectra, Peptide
- Different Search Engine
Combine Peptide Score:
- Same Peptide
- Different PSMs
Experiment Peptide PSMs Quant
<a href="http://www.ebi.ac.uk/pride/archive/assays/34642">Assay 34642</a> APPLLEGAPFR 1 1.000000
<a href="http://www.ebi.ac.uk/pride/archive/assays/34644">Assay 34644</a> APPLLEGAPFR 1 1.000000
<a href="http://www.ebi.ac.uk/pride/archive/assays/34642">Assay 34642</a> THTQDAVPLTLGQEFSGYVQQVQYAM(oxidation)VR 1 1.000000
<a href="http://www.ebi.ac.uk/pride/archive/assays/34645">Assay 34645</a> KKQVM(oxidation)EK 1 1.000000
<a href="http://www.ebi.ac.uk/pride/archive/assays/34645">Assay 34645</a> VGSGDTNNFPYLEK 2 2.000000
<a href="http://www.ebi.ac.uk/pride/archive/assays/34645">Assay 34645</a> SLTYLSILR 3 3.000000
<a href="http://www.ebi.ac.uk/pride/archive/assays/34642">Assay 34642</a> LPFTPLSYIQGLSHR 8 8.000000
Audain et. al, J Proteomics 2017
A B DC E
P1 P3P2 P4
PR1
JIG HF
P5
PR1
Protein Inference Toolkit
Protein Groups
Meeting Sanger-EBI team
EMBL-EBI, November 2017
Mapping peptides to ENSEMBL: PoGo
Schlaffner CN., Pirklbauer G, Bender A , Choudhary JS, PoGo: Jumping from Peptides to Genomic Loci, biorxiv (2016)
For each .pogo file:
• PTMs are standard to a common
representation using PRIDE-Mod library.
• Each Peptide reference to an Assay URL in
PRIDE.
• Each Pogo file is generated automatically by
the PRIDE Pipeline.
chr1 1314335 1314365 VLIPVFALGR 1000 - 1314335 1314335 0,0,0 1 30 0
chr1 1454464 1454488 ITVLEALR 1000 + 1454464 1454464 128,128,128 1 24 0
chr1 1456317 1456344 LFDWANTSR 1000 + 1456317 1456317 128,128,128 1 27 0
chr1 1459184 1459211 ATLNAFLYR 1000 + 1459184 1459184 128,128,128 1 27 0
chr1 1462609 1462633 LAQFDYGR 1000 + 1462609 1462609 128,128,128 1 24 0
chr1 1485135 1485159 ITVLEALR 1000 + 1485135 1485135 128,128,128 1 24 0
chr1 1486636 1486663 LFDWANTSR 1000 + 1486636 1486636 128,128,128 1 27 0
chr1 1490572 1490596 LAQFDYGR 1000 + 1490572 1490572 128,128,128 1 24 0
chr1 1522863 1522887 ITVLEALR 1000 + 1522863 1522863 128,128,128 1 24 0
Challenge in the Future:
• Bed information can be extended with more
information about the transcript reliability.
• Peptide uniqueness
• Reliability score.
• Native bigBed should be provided to remove
the customization of new pipelines, etc.
• What to do with the unmapped peptides
(which are long lists.)
• Maintainability.
Meeting Sanger-EBI team
EMBL-EBI, November 2017
Getting there..
Taxonomy COUNT
Human 876
Mouse 252
Arabidopsis thaliana 127
Rat 68
Yeast 62
E.Coli 38
Bos taurus 32
Drosophila melanogaster 21
zebrafish 20
Zeaa Mays 15
Candida albicans 12
Gallus gallus 12
Caenorhabditis elegans 10
Horse 10
Synechocystis sp. PCC 9
Glycine max 9
Complete
Submission
PRIDE
Taxonomy COUNT
Human 74
Mouse
Arabidopsis thaliana
Rat 4
Yeast
E.Coli
Bos taurus
Drosophila melanogaster
zebrafish
Zeaa Mays
Candida albicans
Gallus gallus
Caenorhabditis elegans
Horse
Synechocystis sp. PCC
Glycine max
TrackHub
Creation
Meeting Sanger-EBI team
EMBL-EBI, November 2017
TrackHub creation and Publication
Track
TrackHub Creation
Taxonomies
Pogo Files
Pogo Files
Taxonomies
TrackHub
Registry
trackhub-
parameters.json
1
2
TrackHub
3
TrackHub
Registry
4
TrackHub Creator:
Python Framework.
Interact with ENSEMBL API to retrieve the
latest ENSEMBL release and assembly
version (for all species supported by
PoGo.
Download the corresponding GTF and
FASTA (Protein File) for each Species
supported by PoGo.
Using Pogo source code, compile the
Pogo version and generate the bed/bigbed
files.
Create/Update the TrackHub for an
specific PX submission. A new library has
been developed to interact with
ENSEMBL trackhub registry
(https://github.com/PRIDE-Utilities/track-
hub-registrator)
https://github.com/Proteogenomics/trackhub-
creator
1
2
3
4
Meeting Sanger-EBI team
EMBL-EBI, November 2017
Trackhub creator
• Download ENSEMBL FASTA Protein file and GTF for an specific taxonomy. Detect the taxonomy
in PRIDE and automatically query ENSEMBL API and Download the data from it.
• Run PoGo and generate the mapping files: bed files for Peptides and Modified Peptides (On going,
We will automatically generate the 1 and 2 gaps files).
• Generate the TrackHub with an specific structure (Next session.).
• Publish the Trackhub in ENSEMBL Registry (https://www.trackhubregistry.org/search).
Note: We have faced millions of problems:
1- ENSEMBL and UCSC support only UCSC notations for
almost everything:
genome hg38 trackDb hg38/trackDb.txt
Chromosomes should be chr1, chr2.. Etc.
Chromosomes sizes are programmatically
available.
2- Taxonomies where not supported on PoGo. We have
moved to a better list now with 19 Taxonomies.
3- Modifications has been a nightmare between UNIMOD
and PoGo.
http://ftp.pride.ebi.ac.uk/pride/data/proteogenomics/latest/archive/
Meeting Sanger-EBI team
EMBL-EBI, November 2017
Getting there..
http://genome.ucsc.edu/cgi-
bin/hgTracks?db=mm10&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=
0&nonVirtPosition=&position=chr12%3A20021505-
100107519&hgsid=644445905_BScrPMQymPnGtl9O7jeSv1jS5Rm0
Meeting Sanger-EBI team
EMBL-EBI, November 2017
Getting there..
Meeting Sanger-EBI team
EMBL-EBI, November 2017
PRIDE Proteogenomics
• Provides a Trackhub in ENSEMBL for every
ProteomeXchange COMPLETE submission in
ENSEMBL.
• Provides a global TrackHub in ENSEMBL for
all PRIDE peptide evidences.
Meeting Sanger-EBI team
EMBL-EBI, November 2017
PRIDE Data Growth
Meeting Sanger-EBI team
EMBL-EBI, November 2017
PRIDE Cluster Approach
“When merging large number of datasets coming from different groups/bioinformatics workflows the
consensus spectra should be generated to evaluate the accuracy of each independent dataset."
We know results depend on: Database, Search engine Settings, bioinformatics pipeline, etc.
Assay-24261
Assay-3394
Peptide Mascot Score: 54.06
Peptide Mascot Score: 32.86
GYTFSTTAER: 3 PSMs
GYSFTTTAER: 13918 (143)
Meeting Sanger-EBI team
EMBL-EBI, November 2017
PRIDE Cluster Pipeline
PX
Complete
.
.
n
Hadoop Cluster
PRIDE Archive Import
PX successfully converted
New Peptide/PTMs
Number of Identified and non-Identified Spectra
QC QC
Number of new clusters
PRIDE Cluster score distribution
Number of clusters by modification
mgf
(Annotated
spectra)
Clustering
Files
QC
Number of Peptides
Number of new Peptides
Number of PTMs
Number of New PTMs
Peptide
Tables
(Pogo File)
Peptide Export
Taxonomies
Track
TrackHub Generation
Taxonomies
TrackHub
Registry
Meeting Sanger-EBI team
EMBL-EBI, November 2017
PRIDE Cluster Peptide Evidences.
Human MouseArabidopsis
The Peptide Evidence Files from PRIDE Cluster collapsed all
the peptide evidences from reliable clusters:
ftp://ftp.pride.ebi.ac.uk/pride/data/cluster/peptide-
results/2015-04/
Meeting Sanger-EBI team
EMBL-EBI, November 2017
All trackhubs already published
Note: Some challenges ahead:
1- Taxonomies like Saccharomyces cerevisiae are not
supported in PoGo.
2- Scaffold are not supported in Pogo with the UCSC
notation. The same problem that we face with chromosome
1, 2, .. To chr1 chr2, etc can’t be done with scaffold.
3- More evidences are needed.
Meeting Sanger-EBI team
EMBL-EBI, November 2017
Getting there … (PepBed package)
• Black (all identified
peptides).
• Cyan (oxidation)
• Orange (acetyl)
• Red (phospho)
Meeting Sanger-EBI team
EMBL-EBI, November 2017
Getting there … (Comparions with APRIS
Peptidome)
APRIS Peptidome:
• The eight studies covered a huge range of tissues and cell types:
• The peptides from the PeptideAtlas database cover 51 different tissues, cell types, and developmental stages (2016).
• Geiger study interrogated 11 different cell types.
• NIST database.
• The Kim and Wilhelm analyses peptides were generated from 30 and 35 distinct tissues types (51 tissues in total).
Meeting Sanger-EBI team
EMBL-EBI, November 2017
PRIDE team.
Manuel Bernal-Llinares
(track-hub creator)
Tobias Ternent
(pride pipelines)
Johannes Griss
(pride cluster pipelines)
Sanger Team
Christoph Schlaffner
(pogo tool)
Jyoti Choudhary
(PI)
Juan A. Vizcaino
(PI)
Alessandro Vullo
(trackhub registry)
ENSEMBLTeam

More Related Content

What's hot

Python by Martin Geisler
Python by Martin GeislerPython by Martin Geisler
Python by Martin GeislerAberla
 
The Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF PrimerThe Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF PrimerSasha Goldshtein
 
Finding target for hacking on internet is now easier
Finding target for hacking on internet is now easierFinding target for hacking on internet is now easier
Finding target for hacking on internet is now easierDavid Thomas
 
Пример отчета по анализу вредоносного кода Zeus, подготовленного Cisco AMP Th...
Пример отчета по анализу вредоносного кода Zeus, подготовленного Cisco AMP Th...Пример отчета по анализу вредоносного кода Zeus, подготовленного Cisco AMP Th...
Пример отчета по анализу вредоносного кода Zeus, подготовленного Cisco AMP Th...Cisco Russia
 
Python Brasil 2010 - Potter vs Voldemort - Lições ofidiglotas da prática Pyth...
Python Brasil 2010 - Potter vs Voldemort - Lições ofidiglotas da prática Pyth...Python Brasil 2010 - Potter vs Voldemort - Lições ofidiglotas da prática Pyth...
Python Brasil 2010 - Potter vs Voldemort - Lições ofidiglotas da prática Pyth...Rodrigo Senra
 
Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...
Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...
Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...Cisco Russia
 
Debugging node in prod
Debugging node in prodDebugging node in prod
Debugging node in prodYunong Xiao
 
How You Will Get Hacked Ten Years from Now
How You Will Get Hacked Ten Years from NowHow You Will Get Hacked Ten Years from Now
How You Will Get Hacked Ten Years from Nowjulievreeland
 
20141219 workshop methylation sequencing analysis
20141219 workshop methylation sequencing analysis20141219 workshop methylation sequencing analysis
20141219 workshop methylation sequencing analysisYi-Feng Chang
 
PGroonga – Make PostgreSQL fast full text search platform for all languages!
PGroonga – Make PostgreSQL fast full text search platform for all languages!PGroonga – Make PostgreSQL fast full text search platform for all languages!
PGroonga – Make PostgreSQL fast full text search platform for all languages!Kouhei Sutou
 
Defcon 2011 network forensics 解题记录
Defcon 2011 network forensics 解题记录Defcon 2011 network forensics 解题记录
Defcon 2011 network forensics 解题记录insight-labs
 
GDG Cloud Taipei meetup #50 - Build go kit microservices at kubernetes with ...
GDG Cloud Taipei meetup #50 - Build go kit microservices at kubernetes  with ...GDG Cloud Taipei meetup #50 - Build go kit microservices at kubernetes  with ...
GDG Cloud Taipei meetup #50 - Build go kit microservices at kubernetes with ...KAI CHU CHUNG
 

What's hot (20)

Python by Martin Geisler
Python by Martin GeislerPython by Martin Geisler
Python by Martin Geisler
 
Perepelitsa
PerepelitsaPerepelitsa
Perepelitsa
 
The Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF PrimerThe Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF Primer
 
20161021_master_lesson_no_feedback
20161021_master_lesson_no_feedback20161021_master_lesson_no_feedback
20161021_master_lesson_no_feedback
 
Packet crafting of2013
Packet crafting of2013Packet crafting of2013
Packet crafting of2013
 
Finding target for hacking on internet is now easier
Finding target for hacking on internet is now easierFinding target for hacking on internet is now easier
Finding target for hacking on internet is now easier
 
Пример отчета по анализу вредоносного кода Zeus, подготовленного Cisco AMP Th...
Пример отчета по анализу вредоносного кода Zeus, подготовленного Cisco AMP Th...Пример отчета по анализу вредоносного кода Zeus, подготовленного Cisco AMP Th...
Пример отчета по анализу вредоносного кода Zeus, подготовленного Cisco AMP Th...
 
Python Brasil 2010 - Potter vs Voldemort - Lições ofidiglotas da prática Pyth...
Python Brasil 2010 - Potter vs Voldemort - Lições ofidiglotas da prática Pyth...Python Brasil 2010 - Potter vs Voldemort - Lições ofidiglotas da prática Pyth...
Python Brasil 2010 - Potter vs Voldemort - Lições ofidiglotas da prática Pyth...
 
Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...
Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...
Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...
 
Streaming replication
Streaming replicationStreaming replication
Streaming replication
 
Debugging node in prod
Debugging node in prodDebugging node in prod
Debugging node in prod
 
How You Will Get Hacked Ten Years from Now
How You Will Get Hacked Ten Years from NowHow You Will Get Hacked Ten Years from Now
How You Will Get Hacked Ten Years from Now
 
Hitchikers guide handout
Hitchikers guide handoutHitchikers guide handout
Hitchikers guide handout
 
Filelist
FilelistFilelist
Filelist
 
20141219 workshop methylation sequencing analysis
20141219 workshop methylation sequencing analysis20141219 workshop methylation sequencing analysis
20141219 workshop methylation sequencing analysis
 
PGroonga – Make PostgreSQL fast full text search platform for all languages!
PGroonga – Make PostgreSQL fast full text search platform for all languages!PGroonga – Make PostgreSQL fast full text search platform for all languages!
PGroonga – Make PostgreSQL fast full text search platform for all languages!
 
Defcon 2011 network forensics 解题记录
Defcon 2011 network forensics 解题记录Defcon 2011 network forensics 解题记录
Defcon 2011 network forensics 解题记录
 
GDG Cloud Taipei meetup #50 - Build go kit microservices at kubernetes with ...
GDG Cloud Taipei meetup #50 - Build go kit microservices at kubernetes  with ...GDG Cloud Taipei meetup #50 - Build go kit microservices at kubernetes  with ...
GDG Cloud Taipei meetup #50 - Build go kit microservices at kubernetes with ...
 
Honeywall roo 2
Honeywall roo 2Honeywall roo 2
Honeywall roo 2
 
Streams for the Web
Streams for the WebStreams for the Web
Streams for the Web
 

Similar to Systematic integration of millions of peptidoform evidences into Ensembl and other genome browsers

Mapping millions of peptidoforms to Genome Coordinates
Mapping millions of peptidoforms to Genome CoordinatesMapping millions of peptidoforms to Genome Coordinates
Mapping millions of peptidoforms to Genome CoordinatesYasset Perez-Riverol
 
ProteomeXchange: data deposition and data retrieval made easy
ProteomeXchange: data deposition and data retrieval made easyProteomeXchange: data deposition and data retrieval made easy
ProteomeXchange: data deposition and data retrieval made easyJuan Antonio Vizcaino
 
OpenMS: Quantitative proteomics at large scale
OpenMS: Quantitative proteomics at large scaleOpenMS: Quantitative proteomics at large scale
OpenMS: Quantitative proteomics at large scaleYasset Perez-Riverol
 
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Ben Busby
 
InterPro and InterProScan 5.0
InterPro and InterProScan 5.0InterPro and InterProScan 5.0
InterPro and InterProScan 5.0EBI
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchAnshika Bansal
 
Using RAST for phage annotation (2018 VoM meeting)
Using RAST for phage annotation (2018 VoM meeting)Using RAST for phage annotation (2018 VoM meeting)
Using RAST for phage annotation (2018 VoM meeting)Ramy K. Aziz
 
PRIDE and ProteomeXchange – Making proteomics data accessible and reusable
PRIDE and ProteomeXchange – Making proteomics data accessible and reusablePRIDE and ProteomeXchange – Making proteomics data accessible and reusable
PRIDE and ProteomeXchange – Making proteomics data accessible and reusable Yasset Perez-Riverol
 
ICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartAraport
 
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - KNIME
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - KNIME2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - KNIME
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - KNIMEopen_phacts
 
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesYasset Perez-Riverol
 
Mining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasetsMining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasetsJuan Antonio Vizcaino
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012Dan Gaston
 
How can you access PubChem programmatically?
How can you access PubChem programmatically?How can you access PubChem programmatically?
How can you access PubChem programmatically?Sunghwan Kim
 
Data Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software DevelopmentData Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software DevelopmentNeil Swainston
 
TAPIR PyWrapper3, at GBIF GB14 nodes meeting (2007)
TAPIR PyWrapper3, at GBIF GB14 nodes meeting (2007)TAPIR PyWrapper3, at GBIF GB14 nodes meeting (2007)
TAPIR PyWrapper3, at GBIF GB14 nodes meeting (2007)Dag Endresen
 
Making Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsMaking Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsJoão André Carriço
 
Big data solution for ngs data analysis
Big data solution for ngs data analysisBig data solution for ngs data analysis
Big data solution for ngs data analysisYun Lung Li
 

Similar to Systematic integration of millions of peptidoform evidences into Ensembl and other genome browsers (20)

Mapping millions of peptidoforms to Genome Coordinates
Mapping millions of peptidoforms to Genome CoordinatesMapping millions of peptidoforms to Genome Coordinates
Mapping millions of peptidoforms to Genome Coordinates
 
ProteomeXchange: data deposition and data retrieval made easy
ProteomeXchange: data deposition and data retrieval made easyProteomeXchange: data deposition and data retrieval made easy
ProteomeXchange: data deposition and data retrieval made easy
 
OpenMS: Quantitative proteomics at large scale
OpenMS: Quantitative proteomics at large scaleOpenMS: Quantitative proteomics at large scale
OpenMS: Quantitative proteomics at large scale
 
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
 
InterPro and InterProScan 5.0
InterPro and InterProScan 5.0InterPro and InterProScan 5.0
InterPro and InterProScan 5.0
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Using RAST for phage annotation (2018 VoM meeting)
Using RAST for phage annotation (2018 VoM meeting)Using RAST for phage annotation (2018 VoM meeting)
Using RAST for phage annotation (2018 VoM meeting)
 
PRIDE and ProteomeXchange – Making proteomics data accessible and reusable
PRIDE and ProteomeXchange – Making proteomics data accessible and reusablePRIDE and ProteomeXchange – Making proteomics data accessible and reusable
PRIDE and ProteomeXchange – Making proteomics data accessible and reusable
 
ICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick Provart
 
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - KNIME
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - KNIME2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - KNIME
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - KNIME
 
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata files
 
Mining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasetsMining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasets
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
Path2 ppi
Path2 ppiPath2 ppi
Path2 ppi
 
How can you access PubChem programmatically?
How can you access PubChem programmatically?How can you access PubChem programmatically?
How can you access PubChem programmatically?
 
Intro to databases
Intro to databasesIntro to databases
Intro to databases
 
Data Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software DevelopmentData Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software Development
 
TAPIR PyWrapper3, at GBIF GB14 nodes meeting (2007)
TAPIR PyWrapper3, at GBIF GB14 nodes meeting (2007)TAPIR PyWrapper3, at GBIF GB14 nodes meeting (2007)
TAPIR PyWrapper3, at GBIF GB14 nodes meeting (2007)
 
Making Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsMaking Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and Annotations
 
Big data solution for ngs data analysis
Big data solution for ngs data analysisBig data solution for ngs data analysis
Big data solution for ngs data analysis
 

More from Yasset Perez-Riverol

Biocontainers 2019: Presentation for the ELIXIR All Hands
Biocontainers 2019: Presentation for the ELIXIR All HandsBiocontainers 2019: Presentation for the ELIXIR All Hands
Biocontainers 2019: Presentation for the ELIXIR All HandsYasset Perez-Riverol
 
Biocontainers Hackathon Introduction
Biocontainers Hackathon IntroductionBiocontainers Hackathon Introduction
Biocontainers Hackathon IntroductionYasset Perez-Riverol
 
BioContainers on ELIXIR All Hands 2017
BioContainers on ELIXIR All Hands 2017BioContainers on ELIXIR All Hands 2017
BioContainers on ELIXIR All Hands 2017Yasset Perez-Riverol
 
Do we need to make public our proteomics data?
Do we need to make public our proteomics data?Do we need to make public our proteomics data?
Do we need to make public our proteomics data?Yasset Perez-Riverol
 
Design of an hexapeptide database for proteomics studies
Design of an hexapeptide database for proteomics studiesDesign of an hexapeptide database for proteomics studies
Design of an hexapeptide database for proteomics studiesYasset Perez-Riverol
 
Parallel conformational search of small molecules
Parallel conformational search of small moleculesParallel conformational search of small molecules
Parallel conformational search of small moleculesYasset Perez-Riverol
 
SintCompound: A Small Compound Database for Virtual Screening
SintCompound: A Small Compound Database for Virtual ScreeningSintCompound: A Small Compound Database for Virtual Screening
SintCompound: A Small Compound Database for Virtual ScreeningYasset Perez-Riverol
 

More from Yasset Perez-Riverol (11)

Introduction to Proteogenomics
Introduction to Proteogenomics Introduction to Proteogenomics
Introduction to Proteogenomics
 
Biocontainers 2019: Presentation for the ELIXIR All Hands
Biocontainers 2019: Presentation for the ELIXIR All HandsBiocontainers 2019: Presentation for the ELIXIR All Hands
Biocontainers 2019: Presentation for the ELIXIR All Hands
 
Biocontainers Hackathon Introduction
Biocontainers Hackathon IntroductionBiocontainers Hackathon Introduction
Biocontainers Hackathon Introduction
 
BioContainers on ELIXIR All Hands 2017
BioContainers on ELIXIR All Hands 2017BioContainers on ELIXIR All Hands 2017
BioContainers on ELIXIR All Hands 2017
 
Do we need to make public our proteomics data?
Do we need to make public our proteomics data?Do we need to make public our proteomics data?
Do we need to make public our proteomics data?
 
Design of an hexapeptide database for proteomics studies
Design of an hexapeptide database for proteomics studiesDesign of an hexapeptide database for proteomics studies
Design of an hexapeptide database for proteomics studies
 
Parallel conformational search of small molecules
Parallel conformational search of small moleculesParallel conformational search of small molecules
Parallel conformational search of small molecules
 
PBS Web (Spanish)
PBS Web (Spanish)PBS Web (Spanish)
PBS Web (Spanish)
 
Yasset perezriverol csi2011
Yasset perezriverol csi2011Yasset perezriverol csi2011
Yasset perezriverol csi2011
 
Yasset iso point-cigb-2012
Yasset iso point-cigb-2012Yasset iso point-cigb-2012
Yasset iso point-cigb-2012
 
SintCompound: A Small Compound Database for Virtual Screening
SintCompound: A Small Compound Database for Virtual ScreeningSintCompound: A Small Compound Database for Virtual Screening
SintCompound: A Small Compound Database for Virtual Screening
 

Recently uploaded

Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravitySubhadipsau21168
 

Recently uploaded (20)

Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified Gravity
 

Systematic integration of millions of peptidoform evidences into Ensembl and other genome browsers

  • 1. Yasset Perez-Riverol Ph.D PRIDE Team Systematic integration of millions of peptidoform evidences into Ensembl and other genome browsers.
  • 2. Meeting Sanger-EBI team EMBL-EBI, November 2017 PRIDE Proteogenomics • Provides a Trackhub in ENSEMBL for every ProteomeXchange COMPLETE submission in ENSEMBL. • Provides a global TrackHub in ENSEMBL for all PRIDE peptide evidences.
  • 3. Meeting Sanger-EBI team EMBL-EBI, November 2017 PX Complete Trackhub in ENSEMBL PX Submission Tool PRIDE Archive 1 2 PRIDE submission Pipelines PRIDE Archive Web and API 3 TrackHub Registry 4 PX submission can be Partial or Complete: Partial Submission: RAW data, SEARCH Results and Peaks Lists. Complete Submission: RAW data, Result Files and Peak Lists, SEARCH Results. 1 Each PX submission can be search by: Title, Metadata, Description, Tissue, Taxonomy, PTMs. Peptide Sequence or Protein Identifier. 3 TrackHub Registry can search Tracks by: ShortLabel, LongLabel. OmicsType: Proteomics, Genomics, Transcriptomics. 4 PRIDE Submission Pipelines2 mzid Peak lists PX Complete Submission Assays mztab mgf lists PX Complete Submission Assays 5 Convert to mztab/mgf and filter evidences do not pass the reported mzid threshold. 5 Storage of the Project Metadata, Peptide Sequences, Protein identifiers in Solr and MongoDB. 6 6 Assay Peptide Pogo File PX Complete Submission Taxonomies Track TrackHub Generation Taxonomies TrackHub Registry
  • 4. Meeting Sanger-EBI team EMBL-EBI, November 2017 Generation reliable peptide tables Current Filter options: • 1% FDR PSM level (Combine Results) • 1% FDR Peptide Level (Combine Results) Possible Filters (HPP): • > 8 AA • 1% FDR at transcript level (inference needed) Combine PSM Score: - Same Spectra, Peptide - Different Search Engine Combine Peptide Score: - Same Peptide - Different PSMs Experiment Peptide PSMs Quant <a href="http://www.ebi.ac.uk/pride/archive/assays/34642">Assay 34642</a> APPLLEGAPFR 1 1.000000 <a href="http://www.ebi.ac.uk/pride/archive/assays/34644">Assay 34644</a> APPLLEGAPFR 1 1.000000 <a href="http://www.ebi.ac.uk/pride/archive/assays/34642">Assay 34642</a> THTQDAVPLTLGQEFSGYVQQVQYAM(oxidation)VR 1 1.000000 <a href="http://www.ebi.ac.uk/pride/archive/assays/34645">Assay 34645</a> KKQVM(oxidation)EK 1 1.000000 <a href="http://www.ebi.ac.uk/pride/archive/assays/34645">Assay 34645</a> VGSGDTNNFPYLEK 2 2.000000 <a href="http://www.ebi.ac.uk/pride/archive/assays/34645">Assay 34645</a> SLTYLSILR 3 3.000000 <a href="http://www.ebi.ac.uk/pride/archive/assays/34642">Assay 34642</a> LPFTPLSYIQGLSHR 8 8.000000 Audain et. al, J Proteomics 2017 A B DC E P1 P3P2 P4 PR1 JIG HF P5 PR1 Protein Inference Toolkit Protein Groups
  • 5. Meeting Sanger-EBI team EMBL-EBI, November 2017 Mapping peptides to ENSEMBL: PoGo Schlaffner CN., Pirklbauer G, Bender A , Choudhary JS, PoGo: Jumping from Peptides to Genomic Loci, biorxiv (2016) For each .pogo file: • PTMs are standard to a common representation using PRIDE-Mod library. • Each Peptide reference to an Assay URL in PRIDE. • Each Pogo file is generated automatically by the PRIDE Pipeline. chr1 1314335 1314365 VLIPVFALGR 1000 - 1314335 1314335 0,0,0 1 30 0 chr1 1454464 1454488 ITVLEALR 1000 + 1454464 1454464 128,128,128 1 24 0 chr1 1456317 1456344 LFDWANTSR 1000 + 1456317 1456317 128,128,128 1 27 0 chr1 1459184 1459211 ATLNAFLYR 1000 + 1459184 1459184 128,128,128 1 27 0 chr1 1462609 1462633 LAQFDYGR 1000 + 1462609 1462609 128,128,128 1 24 0 chr1 1485135 1485159 ITVLEALR 1000 + 1485135 1485135 128,128,128 1 24 0 chr1 1486636 1486663 LFDWANTSR 1000 + 1486636 1486636 128,128,128 1 27 0 chr1 1490572 1490596 LAQFDYGR 1000 + 1490572 1490572 128,128,128 1 24 0 chr1 1522863 1522887 ITVLEALR 1000 + 1522863 1522863 128,128,128 1 24 0 Challenge in the Future: • Bed information can be extended with more information about the transcript reliability. • Peptide uniqueness • Reliability score. • Native bigBed should be provided to remove the customization of new pipelines, etc. • What to do with the unmapped peptides (which are long lists.) • Maintainability.
  • 6. Meeting Sanger-EBI team EMBL-EBI, November 2017 Getting there.. Taxonomy COUNT Human 876 Mouse 252 Arabidopsis thaliana 127 Rat 68 Yeast 62 E.Coli 38 Bos taurus 32 Drosophila melanogaster 21 zebrafish 20 Zeaa Mays 15 Candida albicans 12 Gallus gallus 12 Caenorhabditis elegans 10 Horse 10 Synechocystis sp. PCC 9 Glycine max 9 Complete Submission PRIDE Taxonomy COUNT Human 74 Mouse Arabidopsis thaliana Rat 4 Yeast E.Coli Bos taurus Drosophila melanogaster zebrafish Zeaa Mays Candida albicans Gallus gallus Caenorhabditis elegans Horse Synechocystis sp. PCC Glycine max TrackHub Creation
  • 7. Meeting Sanger-EBI team EMBL-EBI, November 2017 TrackHub creation and Publication Track TrackHub Creation Taxonomies Pogo Files Pogo Files Taxonomies TrackHub Registry trackhub- parameters.json 1 2 TrackHub 3 TrackHub Registry 4 TrackHub Creator: Python Framework. Interact with ENSEMBL API to retrieve the latest ENSEMBL release and assembly version (for all species supported by PoGo. Download the corresponding GTF and FASTA (Protein File) for each Species supported by PoGo. Using Pogo source code, compile the Pogo version and generate the bed/bigbed files. Create/Update the TrackHub for an specific PX submission. A new library has been developed to interact with ENSEMBL trackhub registry (https://github.com/PRIDE-Utilities/track- hub-registrator) https://github.com/Proteogenomics/trackhub- creator 1 2 3 4
  • 8. Meeting Sanger-EBI team EMBL-EBI, November 2017 Trackhub creator • Download ENSEMBL FASTA Protein file and GTF for an specific taxonomy. Detect the taxonomy in PRIDE and automatically query ENSEMBL API and Download the data from it. • Run PoGo and generate the mapping files: bed files for Peptides and Modified Peptides (On going, We will automatically generate the 1 and 2 gaps files). • Generate the TrackHub with an specific structure (Next session.). • Publish the Trackhub in ENSEMBL Registry (https://www.trackhubregistry.org/search). Note: We have faced millions of problems: 1- ENSEMBL and UCSC support only UCSC notations for almost everything: genome hg38 trackDb hg38/trackDb.txt Chromosomes should be chr1, chr2.. Etc. Chromosomes sizes are programmatically available. 2- Taxonomies where not supported on PoGo. We have moved to a better list now with 19 Taxonomies. 3- Modifications has been a nightmare between UNIMOD and PoGo. http://ftp.pride.ebi.ac.uk/pride/data/proteogenomics/latest/archive/
  • 9. Meeting Sanger-EBI team EMBL-EBI, November 2017 Getting there.. http://genome.ucsc.edu/cgi- bin/hgTracks?db=mm10&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode= 0&nonVirtPosition=&position=chr12%3A20021505- 100107519&hgsid=644445905_BScrPMQymPnGtl9O7jeSv1jS5Rm0
  • 10. Meeting Sanger-EBI team EMBL-EBI, November 2017 Getting there..
  • 11. Meeting Sanger-EBI team EMBL-EBI, November 2017 PRIDE Proteogenomics • Provides a Trackhub in ENSEMBL for every ProteomeXchange COMPLETE submission in ENSEMBL. • Provides a global TrackHub in ENSEMBL for all PRIDE peptide evidences.
  • 12. Meeting Sanger-EBI team EMBL-EBI, November 2017 PRIDE Data Growth
  • 13. Meeting Sanger-EBI team EMBL-EBI, November 2017 PRIDE Cluster Approach “When merging large number of datasets coming from different groups/bioinformatics workflows the consensus spectra should be generated to evaluate the accuracy of each independent dataset." We know results depend on: Database, Search engine Settings, bioinformatics pipeline, etc. Assay-24261 Assay-3394 Peptide Mascot Score: 54.06 Peptide Mascot Score: 32.86 GYTFSTTAER: 3 PSMs GYSFTTTAER: 13918 (143)
  • 14. Meeting Sanger-EBI team EMBL-EBI, November 2017 PRIDE Cluster Pipeline PX Complete . . n Hadoop Cluster PRIDE Archive Import PX successfully converted New Peptide/PTMs Number of Identified and non-Identified Spectra QC QC Number of new clusters PRIDE Cluster score distribution Number of clusters by modification mgf (Annotated spectra) Clustering Files QC Number of Peptides Number of new Peptides Number of PTMs Number of New PTMs Peptide Tables (Pogo File) Peptide Export Taxonomies Track TrackHub Generation Taxonomies TrackHub Registry
  • 15. Meeting Sanger-EBI team EMBL-EBI, November 2017 PRIDE Cluster Peptide Evidences. Human MouseArabidopsis The Peptide Evidence Files from PRIDE Cluster collapsed all the peptide evidences from reliable clusters: ftp://ftp.pride.ebi.ac.uk/pride/data/cluster/peptide- results/2015-04/
  • 16. Meeting Sanger-EBI team EMBL-EBI, November 2017 All trackhubs already published Note: Some challenges ahead: 1- Taxonomies like Saccharomyces cerevisiae are not supported in PoGo. 2- Scaffold are not supported in Pogo with the UCSC notation. The same problem that we face with chromosome 1, 2, .. To chr1 chr2, etc can’t be done with scaffold. 3- More evidences are needed.
  • 17. Meeting Sanger-EBI team EMBL-EBI, November 2017 Getting there … (PepBed package) • Black (all identified peptides). • Cyan (oxidation) • Orange (acetyl) • Red (phospho)
  • 18. Meeting Sanger-EBI team EMBL-EBI, November 2017 Getting there … (Comparions with APRIS Peptidome) APRIS Peptidome: • The eight studies covered a huge range of tissues and cell types: • The peptides from the PeptideAtlas database cover 51 different tissues, cell types, and developmental stages (2016). • Geiger study interrogated 11 different cell types. • NIST database. • The Kim and Wilhelm analyses peptides were generated from 30 and 35 distinct tissues types (51 tissues in total).
  • 19. Meeting Sanger-EBI team EMBL-EBI, November 2017 PRIDE team. Manuel Bernal-Llinares (track-hub creator) Tobias Ternent (pride pipelines) Johannes Griss (pride cluster pipelines) Sanger Team Christoph Schlaffner (pogo tool) Jyoti Choudhary (PI) Juan A. Vizcaino (PI) Alessandro Vullo (trackhub registry) ENSEMBLTeam