Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
CNCP 2010
Guangchuang Yu
Jinan University
2010.11.19
Beijing 2010.11.10-11
Overview
• Fragmentation
– 孙瑞祥
• Labeling Strategy
– 陆豪杰
• De Novo Sequnceing
– 董梦秋 马斌 王全会 张凯中
• Identification
– 余维川 付岩 叶...
Overview
• Data Processing Platform
– 关慎恒 盛泉虎
• Glycoproteomics
– 杨芃原 应万涛 张凯中
• Proteogenomics
– 谢鹭 赵屹
• Biological Proble...
Fragmentation
孙瑞祥
ICT
Electron Transfer Dissociation: Characterization and
Applications in Protein Identification
2010. Im...
Labeling Strategy
陆豪杰
Fudan Uinv
In vivo termini amino acid labeling for quantitative proteomics
Cover 93% proteins deposi...
De Novo Sequencing
2010. pNovo: De novo Peptide Sequencing and Identification Using HCD Spectra. Journal of Proteome
Resea...
De Novo Sequencing
马斌
U of Waterloo
Complete Homology-Assisted MS/MS Protein
Sequencing (CHAMPS)
2009. Automated protein (...
De Novo Sequencing
王全会
BIG
From an unknown genome to a measurable proteome: Studying
on the pH-dependent proteomes in N10 ...
Identification
余维川
HKUST
Optimization-Based Peptide Mass Fingerprinting for Protein
Mixture Identification
2010. Optimizat...
Identification
付岩
ICT
Unrestrictive modification detection based on related spectral
pairs
2009. Efficient discovery of ab...
Identification
叶明亮
DICP
Development of Methods and Platform for Data Processing in
Mass Spectrometry Based Proteome Resear...
Label free semi-quantitation
邓宁
ZJU
Quantitative Analysis of Mitochondrial Proteomes using
Normalized Spectral Abundance F...
Database Construction
邵晨
PUMC
The urinary protein biomarker database
• Data collection
– Manual search in Pubmed
– Review ...
Data Quality Control
朱云平
BPRC
A nonparametric model for quality control of database search
results in shotgun proteomics
2...
Data Processing Platform
关慎恒
UCSF
A data processing platform for mammalian proteome dynamics
studies using stable isotope ...
Data Processing Platform
盛泉虎
SIBS
BuildSummary: A software tool for assembling protein
• Maximize the number of confident ...
Glycoproteomics
Mass spectrometry database for glycoprotein structures
2009. Identification of N-Glycosylation Sites on Se...
Glycoproteomics
应万涛
BPRC
Establishment of a systematic method coupling consecutive MSn
and
software tools for charactering...
Glycoproteomics
张凯中
UWO
Glycan Structure Sequencing with Tandem Mass Spectrometry
2008. Complexities and algorithms for gl...
Proteogenomics
谢鹭
SIBS
The discovery of novel protein-coding features in mouse genome
based on mass spectrometry data
• De...
Proteogenomics
赵屹
ICT
Proteogenomics analysis of Thermoanaerobacter
tengcongensis ( 腾冲嗜热菌 ) at different temperatures
• Ge...
Biological Problem oriented
汪迎春
IGDB
Deciphering the Signaling Network in the Leading Edge of the
Migrating Cells
2007. Pr...
Biological Problem oriented
王通
JNU
Pathway analysis-assisted study strategy in functional
proteomics
2008. HIV-1 infected ...
Biological Problem oriented
徐平
BRPC
Data analysis in large scale quantitative proteomics study with
SILAC approach
2009. Q...
Protein Structure
张法
ICT
Computational methods in cryo-electron microscopy: image data
processing and 3D structure reconst...
Data Analysis
卜东坡
ICT
Designing Succinct Structural Alphabets
2008. Designing succinct structural alphabets. Bioinformatic...
Others
江瑞
Tsinghua
DomainRBF: a Bayesian regression approach to the prioritization
of associations between protein domains...
Others
张红雨
HZAU
Proteins as molecular fossils
2010. A Universal Molecular Clock of Protein Folds and its Power in Tracing ...
Others
张勇
BGI Shenzhen
From NGS Genomics to MS-based Proteomics – BGI’s
bioinformatics activities
• Advertising from BGI S...
All slides will be available at
http://cncp2010.ict.ac.cn/
Phosphorylation
Kevan Shokat
UCSF
Kinase-specific phosphorylation analysis
2008. Covalent capture of kinase-specific phosp...
Phosphorylation
Kevan Shokat
UCSF
Kinase-specific phosphorylation analysis
2004. Design and use of analog-sensitive protei...
Thank You!
Upcoming SlideShare
Loading in …5
×

Cncp 2010

1,616 views

Published on

overview of conference CNCP2010

  • Be the first to comment

  • Be the first to like this

Cncp 2010

  1. 1. CNCP 2010 Guangchuang Yu Jinan University 2010.11.19 Beijing 2010.11.10-11
  2. 2. Overview • Fragmentation – 孙瑞祥 • Labeling Strategy – 陆豪杰 • De Novo Sequnceing – 董梦秋 马斌 王全会 张凯中 • Identification – 余维川 付岩 叶明亮 • Label free semi-quantitation – 邓宁 • Database Construction – 邵晨 杨芃原 • Data Quality Control – 朱云平
  3. 3. Overview • Data Processing Platform – 关慎恒 盛泉虎 • Glycoproteomics – 杨芃原 应万涛 张凯中 • Proteogenomics – 谢鹭 赵屹 • Biological Problem oriented – 汪迎春 王通 徐平 • Protein Structure – 张法 卜东坡 • Others – 江瑞 张勇 张红雨
  4. 4. Fragmentation 孙瑞祥 ICT Electron Transfer Dissociation: Characterization and Applications in Protein Identification 2010. Improved Peptide Identification for Proteomic Analysis Based on Comprehensive Characterization of Electron Transfer Dissociation Spectra. J Proteome Res. Important spectral characteristics of ETD are ignored or underutilized in popular database search algorithms, such as Mascot, Sequest, OMSSA, OR X! TANDEM Analyzed 461,440 spectra to find ETD characterization distinct hydrogen rearrangement patterns of +2, +3 and +4 precursors charge-reduced precursor ions and associated neutral loss peaks pFind identified 63-122% more unique peptides than Mascot for doubly charged precursors at 1% FDR cutoff.
  5. 5. Labeling Strategy 陆豪杰 Fudan Uinv In vivo termini amino acid labeling for quantitative proteomics Cover 93% proteins deposited in Uniprot. More accuracy for identification and quantification. Dual digest by Arg-C & Lys-N (increase sample complexity)
  6. 6. De Novo Sequencing 2010. pNovo: De novo Peptide Sequencing and Identification Using HCD Spectra. Journal of Proteome Research 9:2713-2724. 董梦秋 NIBS De novo Sequencing of Peptides using HCD Spectra HCD produces high mass accuracy tandem mass spectra, the majority of which contain complete ion series. Besides, abundant internal and immonium ions in the HCD spectra can help differentiate between similar sequences. Ascaris suum sperm crawling related proteins pNovo Identify peptide sequences Blast Homologs of C. elegans Design primer for validation
  7. 7. De Novo Sequencing 马斌 U of Waterloo Complete Homology-Assisted MS/MS Protein Sequencing (CHAMPS) 2009. Automated protein (re)sequencing with MS/MS and a homologous database yields almost full coverage and accuracy. Bioinformatics 25:2174 -2180. Novel protein SPIDER Homologous sequenceDe novo sequences CHAMPS Complete protein sequence (above 99% coverage and 100% accuracy for two standard proteins)
  8. 8. De Novo Sequencing 王全会 BIG From an unknown genome to a measurable proteome: Studying on the pH-dependent proteomes in N10 bacteria by de novo sequencing 2009. Exploring membrane and cytoplasm proteomic responses of Alkalimonas amylolytica N10 to different external pHs with combination strategy of de novo peptide sequencing. Proteomics 9:1254-1273. Tandem spectra with/without SPITC labeling PEAKS for auto de novo Manually analyzed Combine filtered data Validation by PCR and Western blot More than 70% of the differential 2-DE spots were identified
  9. 9. Identification 余维川 HKUST Optimization-Based Peptide Mass Fingerprinting for Protein Mixture Identification 2010. Optimization-based peptide mass fingerprinting for protein mixture identification. J. Comput. Biol 17:221- 235. • PMF method has two inherent disadvantages: – Originally designed for identifying single purified proteins rather than protein mixtures – Can’t distinguish different peptides with identical mass • Heuristic algorithm – Introduce a scoring function for protein mixture identification – Local search algorithms for protein mixture identification • External factors might be optimized to facilitate successful protein mixture identification – Mass accuracy – Sequence coverage – Noise level – Protein number in the mixtures
  10. 10. Identification 付岩 ICT Unrestrictive modification detection based on related spectral pairs 2009. Efficient discovery of abundant post-translational modifications and spectral pairs using peptide mass and retention time differences. BMC Bioinformatics 10 (Suppl 1):S50. • The majority of mass spectra cannot be interpreted at present – Unexpected or unknown protein PTM • Detect abundant PTM in high-accuracy peptide mass spectra – Efficient and sequence database-independent approach – Based on the observation that the spectra of a modified peptide and its unmodified counterpart are correlated with each other in their peptide masses and retention time – Frequently occurring peptide mass differences imply possible modifications – Small and consistent retention time differences provide orthogonal supporting evidence – Use a bivariate Gaussian mixture model to discriminate modification-related spectral pairs from random ones • Results – Experiments on two glycoprotein data sets demonstrate that the method can effectively detect abundant modifications and spectral pairs. – By including the discovered modifications into database search, an average of 10% more spectra are interpreted
  11. 11. Identification 叶明亮 DICP Development of Methods and Platform for Data Processing in Mass Spectrometry Based Proteome Research PMID: 17761002/19551949/18314942/20568719/19522514/20334362 • Un-modified peptide identification – Implemented a predictive genetic algorithm for optimization of filtering criteria to maximize the number of identified peptides at fixed FDR for SEQUEST – Introduced an approach for calculating posterior probability of individual peptide identification from the “local FDR” by using k nearest neighbors algorithm and Shannon information entropy • Phosphopeptide identification – Developed an automatic validation approach for phosphopeptide identification by combining consecutive stage MS data and the target-decoy database searching strategy – Developed a classification filtering strategy to improve the phosphopeptide identification and phosphorylation site localization – Proposed a modified target-decoy database search strategy for confident phosphorylation site analysis of individual phosphoproteins without manual interpretation of spectra – Developed a software ArMone for processing and analysis of phosphoproteome data
  12. 12. Label free semi-quantitation 邓宁 ZJU Quantitative Analysis of Mitochondrial Proteomes using Normalized Spectral Abundance Factor Samples:  5 human cardiac mitochondrial samples  8 murine cardiac mitochondrial samples  7 murine liver mitochondrial samples LC-MS/MS Database search by SEQUEST and statistically validated by Scaffold In-house software to generate NSAF value for quantitative analysis Results:  Electron transport chain show highest abundances , especially in heart  Metabolism related proteins and urea cycle proteins show more abundant in the liver
  13. 13. Database Construction 邵晨 PUMC The urinary protein biomarker database • Data collection – Manual search in Pubmed – Review by Students • Database construction • Basic analysis – Compare different disease type – Simple descriptive statistical analysis – Construct disease-biomarker network and showing some basic topological properties
  14. 14. Data Quality Control 朱云平 BPRC A nonparametric model for quality control of database search results in shotgun proteomics 2008. A nonparametric model for quality control of database search results in shotgun proteomics. BMC Bioinformatics 9:29. • Randomized database were used for quality control • Ignore to combine different database search scores to improve the sensitivity of randomized database methods • A multivariate nonlinear discriminate function (DF) based on the multivariate nonparametric density estimation technique was proposed to filter out false-positive database search results with a predictable FDR
  15. 15. Data Processing Platform 关慎恒 UCSF A data processing platform for mammalian proteome dynamics studies using stable isotope metabolic labeling 2010. Analysis of proteome dynamics in the mouse brain. Proceedings of the National Academy of Sciences 107:14508 -14513. • Data processing platform – Integrate a variety of software modules into a workflow – Specifically developed for 15N metabolic labelling  Cross-extraction of 15N-containing ion intensities from raw data files of varying biosynthetic incorporation times  Computation of peptide 15N incorporation distributions  Aggregation of multiple peptide relative isotope abundance curves into a protein curve – Processing parameter optimization and noise reduction procedures are performed in some necessary processing modules to reduce the propagation errors in a long chain of the processing steps
  16. 16. Data Processing Platform 盛泉虎 SIBS BuildSummary: A software tool for assembling protein • Maximize the number of confident proteins above a threshold of FDR – By integrate results from different peptide search engines for the same dataset • BuildSummary – Allow user to combine many independent PSM (peptide-spectrum matches) scoring algorithms including de novo sequencing and spectrum library search algorithms, if the same peptide FDR is applied to each of them by using target-decoy search approach
  17. 17. Glycoproteomics Mass spectrometry database for glycoprotein structures 2009. Identification of N-Glycosylation Sites on Secreted Proteins of Human Hepatocellular Carcinoma Cells with a Complementary Proteomics Approach. Journal of Proteome Research 8:662-672. 杨芃原 Fudan Univ • Enrichment – Hydrophilic affinity enrichment – PNGase-F release of N-glycan • Results – Identified 4000 spectra of intact N-glycopeptides at FDR of 1% in three 2DLC runs for serum sample – 1500 different glycopeptides, corresponding to 250 glycosylation site, were discovered – Two separated high-confident databases for serum sample were constructed:  Naked glycopeptides (de-glycopeptides) database (523 peptides)  N-glycan database (599 glycans) – software GRIP were developed for interpretation of spectra from intact glycopeptides
  18. 18. Glycoproteomics 应万涛 BPRC Establishment of a systematic method coupling consecutive MSn and software tools for charactering core-fucosylated glycoproteins 2009. A Strategy for Precise and Large Scale Identification of Core Fucosylated Glycoproteins. Molecular & Cellular Proteomics 8:913 -923. • Strategy development – Novel enrichment step  Combining the use of lectin for CF glycoprotein enrichment with ultrafiltration for further enrichment of glycopeptide – Established a neutral loss-dependent MS3 scan method that specifically captures partially deglycosylated CF glycopeptides – Established a novel database-independent candidate spectrum-filtering method for selecting partially deglycosylated CF glycopeptides and a spectrum optimization method
  19. 19. Glycoproteomics 张凯中 UWO Glycan Structure Sequencing with Tandem Mass Spectrometry 2008. Complexities and algorithms for glycan sequencing using tandem mass spectrometry. J Bioinform Comput Biol 6:77-91. 2009. • Glycan de novo sequencing – Glycan database is rather incomplete – Determination of novel glycan structures requires de novo sequencing • Heuristic algorithm – First generates many acceptable small subtrees, which are then joined together in a repetitive process to obtain larger and larger suboptimal subtress until reaching the desired mass – At each size of the subtree, only limited number of subtrees are kept for later use – Experiments on real MS/MS data showed that the heuristic algorithm can be determine glycan structures • Contribution – A polynomial time algorithm is provided under a simple model of glycan de novo sequencing
  20. 20. Proteogenomics 谢鹭 SIBS The discovery of novel protein-coding features in mouse genome based on mass spectrometry data • Detect un-annotated protein-coding regions in mouse genome – Two searchable proteomic database were constructed  All possible encoded exon junctions (EJCT dataset) for the discovery of novel exon splice events  Putative encoded exons (ORF database) for finding uninterrupted novel protein coding regions – Two datasets were combined with a public full-length protein dataset (competitive dataset) respectively and queried against 496 high-accuracy tandam MS RAW files from diverse mouse samples – 32 unique peptides (matching 149 spectra) from EJCT dataset were discovered which straddle novel exon junctions – 104 unique peptides (matching 450 spectra) from ORF dataset were located in 99 unique protein-coding regions
  21. 21. Proteogenomics 赵屹 ICT Proteogenomics analysis of Thermoanaerobacter tengcongensis ( 腾冲嗜热菌 ) at different temperatures • Genome – Estimatd to encode 2588 theoretical proteins • Annotating Genome – By combining proteomics and transcriptomics  Transcriptomic data cover above 70% of 2588 genes  Above 74% of spectra were consistent with transcriptomic data – Quantitative analysis of gene expression levels at 4 different temperatures  359 genes were commonly expressed  Unique expressing genes were also detected in distinct temperatures – 80 genes not belong to 2588 gene set  2 coding regions were supported by MS  21 coding regions may encode novel non-coding RNA – The discovery was used to re-annotate 2588 gene set
  22. 22. Biological Problem oriented 汪迎春 IGDB Deciphering the Signaling Network in the Leading Edge of the Migrating Cells 2007. Profiling signaling polarity in chemotactic cells. Proceedings of the National Academy of Sciences 104:8328 -8333. Characterization of the Ras/ERK Signaling Pathway in the PD by Combined Proteome and Phosphoproteome Profiling
  23. 23. Biological Problem oriented 王通 JNU Pathway analysis-assisted study strategy in functional proteomics 2008. HIV-1 infected astrocytes and the microglial proteome. Journal of neuroimmune pharmacology 3:173-186. • Biological Questions – HIV associated neurodegenerative disorders (HAND) – HIV associated malignancy (HAM) – Infection and cancer
  24. 24. Biological Problem oriented 徐平 BRPC Data analysis in large scale quantitative proteomics study with SILAC approach 2009. Quantitative Proteomics Reveals the Function of Unconventional Ubiquitin Chains in Proteasomal Degradation. Cell 137:133-145. • Background – K48-linked chains are mediators of proteasomal degradation – K6, K11, K27, K29 or K33 are not well understood • Results – Identified K11 linkage-specific substrates, including Ubc6, which involved in ERAD pathway (ER stress response)
  25. 25. Protein Structure 张法 ICT Computational methods in cryo-electron microscopy: image data processing and 3D structure reconstruction 2009. A framework to refine particle clusters produced by EMAN. Bioinformatics 12:i276-i280. • EMAN – One of the most popular software packages for single particle reconstruction • Particle reclustering framework (PRF) – Normalization – Threshold determination – Reclustering
  26. 26. Data Analysis 卜东坡 ICT Designing Succinct Structural Alphabets 2008. Designing succinct structural alphabets. Bioinformatics 24:i182 -i189. • Fragment libraries – A small amount of structural fragments can model protein structures accurately – The library size and accuracy are dominating factors for modeling and predicting the protein structures accurately – A major bottleneck for the fragment-based protein structure prediction methods is designing succinct and highly accurate structural alphabet • Contributions – Introducing structural information items, such as secondary structure, solvent accessibility and contact capacity, can improve the prediction of structural fragments – Derive the best combination of both sequence and structural information items, and significantly reduce the structural alphabet size, at the same level of accuracy by using integer linear programming – Significantly improve the protein structure prediction, with all other conditions unchanged • Scoring function for mapping a sequence segment to a structural fragment – Consists of mutation score, secondary structure score, contact capacity score, and environment fitness score. – Using more scoring items to improve the performance is promising
  27. 27. Others 江瑞 Tsinghua DomainRBF: a Bayesian regression approach to the prioritization of associations between protein domains and human complex diseases 2010. Prioritisation of associations between protein domains and complex diseases using domain-domain interaction networks. Systems Biology, IET 4:212-222. • DomainRBF (domain Rank with Bayes Factor) – To prioritize association between candidate domains and human disease – Ranking score based on ‘guilt-by-association’ principle, which relies on the assumption that a disease is likely to be caused by a set of genes that have similar properties • Data sources – Domain-disease associations – Domain-domain interaction networks • Validation – Large-scale cross validation experiments on simulated linkage intervals, random controls and the whole genome – Results show that areas under ROC curves can be as high as 77.9%
  28. 28. Others 张红雨 HZAU Proteins as molecular fossils 2010. A Universal Molecular Clock of Protein Folds and its Power in Tracing the Early History of Aerobic Metabolism and Planet Oxygenation. Molecular Biology and Evolution. • Proteins can also serve as molecular fossils • Building phylogenies and timelines of domains at fold and fold superfamily levels of structural complexity – Using a phylogenomic structural census in hundreds of proteomes – Correlate approximately linearly with geological timescales – Dissected the structures and functions of enzymes in simulated metabolic networks – The placement of anaerobic and aerobic enzymes in the timeline revealed that aerobic metabolism emerged ~2.9 billion years
  29. 29. Others 张勇 BGI Shenzhen From NGS Genomics to MS-based Proteomics – BGI’s bioinformatics activities • Advertising from BGI Shenzhen – Introduce BGI’s developmental progress
  30. 30. All slides will be available at http://cncp2010.ict.ac.cn/
  31. 31. Phosphorylation Kevan Shokat UCSF Kinase-specific phosphorylation analysis 2008. Covalent capture of kinase-specific phosphopeptides reveals Cdk1-cyclin B substrates. Proceedings of the National Academy of Sciences 105:1442 -1447.
  32. 32. Phosphorylation Kevan Shokat UCSF Kinase-specific phosphorylation analysis 2004. Design and use of analog-sensitive protein kinases. Curr Protoc Mol Biol Chapter 18:Unit 18.11. The amino acid that must be changed to construct –as kinase alleles can be most easily identified using a freely available online resource at http://kinase.ucsf.edu/ksd/.
  33. 33. Thank You!

×