Deep proteome and             transcriptome mapping                      of a human cancer                                ...
Human genome is comprise of a mere 20,000protein coding genes.
RNA-Seq Transcriptomics              Transcripts between                   8000-16000             of protein coding genes
High-res MS-based ProteomicsEssentially completeproteome of model  organism (yeast)Limited to 4000-6000  protein groups in...
“…explore a human proteome in the depthachievable with current technology and to   compare it with the corresponding      ...
METHODOLOGY
HeLa Cells Lysate                    Store atFlash freezing                      -80oC   Sonication       Cell Lysis      ...
Protein Fractionation by Gel          Filtration  0.1mL cell       Load onto    lysate         GL Column          Elution ...
Protein Digestion and Peptide        Fractionation                          Trypsin Removal of    Protein                 ...
Mass Spectroscopy                   RP C18Purification                    MS               Chromatography
RNA sequence                                     RNA library  Extraction      Quantification                      Enrichme...
Data AvailabilityGene and Transcript Quantification          Data analysis
RESULTS AND DISCUSSION
Sample: The HeLa cells• HeLa cells  – Human cervical carcinoma cell line     • “Immortal cells”: can grow indefinitely, be...
Proteome Coverage Study• Objective: to achieve maximum proteome  coverage at a reasonable measurement time  – Procedure: I...
Proteome Coverage Study• Protein fractionation  – Gel filtration: separation based on size and shape• Proteolytic digestio...
Proteome Coverage Study• Pipette-based prefractionations  – Strong anion exchange resin     • Independent of pH     • Most...
Proteome Coverage Study• LC-MS/MS analysis  – Peptide MS spectra     • Interpretation by comparison with lists from genera...
Proteome Coverage Study• Procedure is referred as “shotgun” proteomics  – Most successful strategy to achieve extensive   ...
MaxQuant Computational Proteomics          Environment• MaxQuant  – Quantitative proteomics software package    designed f...
MaxQuant Computational Proteomics          Environment• Number of runs: 2 337 336 high resolution  fragmentation spectra a...
MaxQuant Computational Proteomics          Environment• Average identification of fragmentation  spectra: 43%• Average abs...
MaxQuant Computational Proteomics          Environment• Result of analysis  – Identified and quantified number of peptides...
MaxQuant Computational Proteomics          Environment• Result of analysis   – From obtained data, MaxQuant identified 10 ...
ENSEMBL database and GENSCAN           predictions• ENSEMBL-annotated human protein-coding  genes  – MS/MS spectra: search...
ZAB
Completeness of Detected            Proteome• Inspecting the macrocomplexes which  are functionally necessary• Proteosome,...
Corum protein complex database• Collection of experimentally verified  mammalian protein complexes  – protein complex func...
• Mean proteome coverage of all Corum protein  was >95%• Transcriptome coverage 96.5%• Among the lower coverage which is d...
Complex               Normally Expressed   % Coverage    Sarcoglycan-                                  Muscle            2...
• 5% of the HeLa cell population was in mitosis  – 61/63 proteins in a reference set of cell cycle-    specific proteins  ...
Quantitative Analysis• Deep-sequencing transcriptomics   – Proteomics data - >90% complete• Transciptome + proteome data  ...
Quantitative Analysis           • 40 most abundant             protein comprised             25% of the proteome          ...
Quantitative Analysis• Contribution of each protein to the total mass  in combination with the knowledge of number  of cel...
Quantitative Analysis• Ranked distribution of  proteins   – 90% protein is within a     range of a factor of 60     above ...
Quantitative Analysis• Protein abundance values  – Used to estimate the proportional contribution    of any:     • individ...
Quantitative Analysis• Ribosomes (encoded by  only 1% human gene)  – 195 proteins contributed    6% to total protein mass•...
Quantitative Analysis• Integral membrane  – 25% of the genome  – 7.6% protein mass
Quantitative Analysis           • Protein folding              – 2% of the identified                proteome by number   ...
Quantitative Analysis                         Percentage to the Total Mass                     “Protein folding”   Integra...
Structural proteinsand proteins in basicmachineries                   >                        Regulatory proteins
Ribosome proteins form tight clusterat the top end
Proteosome also abundant but not itsregulatory subunits (factor of 100 less)
Cytoskeletal and metabolic proteinsextend over a broad range
Enolase – highest expression valueGlycogen phosphorylase – 100,000-fold less at protein level and 10,000less at transcript...
Regulatory proteins such as proteinkinases and transcription factors have,on average, lower expression than thestructural ...
TRANSCIPTOME  RNA-SeqPROTEOME  High-res MS“Given the rapid technological  progress in both fields, we   predict that the...
THANK YOU
Deep proteome and trancriptome mapping of human cervical cancer cell line
Deep proteome and trancriptome mapping of human cervical cancer cell line
Deep proteome and trancriptome mapping of human cervical cancer cell line
Upcoming SlideShare
Loading in …5
×

Deep proteome and trancriptome mapping of human cervical cancer cell line

1,049 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,049
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
28
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • In order to understand something, sometime we need to look at the pieces in order to see the bigger picture. In attempting a system-wide understanding of a biological concept, we need an inventory of the system’s building blocks.
  • To understand human genes, the human genome sequence was elcucidated
  • According to the Central Dogma of Molecular Biology, the flow of information goes through the DNA which is transcribed into RNA and then translated into functional proteins.
  • also called "Whole Transcriptome Shotgun Sequencing" [1] ("WTSS") and dubbed "a revolutionary tool for transcriptomics",[2] refers to the use of high-throughput sequencing technologies to sequence cDNA in order to get information about a sample's RNA contentNext generation sequencing techniquesTranscriptome AlignmentDirect RNA sequencing
  • iBAQ (intensity based absolute quantification) - uses the total ion intensity for all of the peptides observed, normalized against the total theoretical number of observable peptidesFKPM (fragments per kilobase of exon per million fragments mapped)
  • Deep proteome and trancriptome mapping of human cervical cancer cell line

    1. 1. Deep proteome and transcriptome mapping of a human cancer cell line BARBA | BIADOMANG |CUA | DAYANAN | LOPEZAUTHORS: N. Nagaraj, J. Wisniewski, T. Geiger, J. Cox, M. Kircher, J. Kelso, S. Paabo, M. MannJOURNAL: Molecular Systems BiologyDATE PUBLISHED: October 29, 2011
    2. 2. Human genome is comprise of a mere 20,000protein coding genes.
    3. 3. RNA-Seq Transcriptomics Transcripts between 8000-16000 of protein coding genes
    4. 4. High-res MS-based ProteomicsEssentially completeproteome of model organism (yeast)Limited to 4000-6000 protein groups in mammalian systems
    5. 5. “…explore a human proteome in the depthachievable with current technology and to compare it with the corresponding transcriptome.”
    6. 6. METHODOLOGY
    7. 7. HeLa Cells Lysate Store atFlash freezing -80oC Sonication Cell Lysis Protein content Centrifugation determination
    8. 8. Protein Fractionation by Gel Filtration 0.1mL cell Load onto lysate GL Column Elution with buffer
    9. 9. Protein Digestion and Peptide Fractionation Trypsin Removal of Protein LysC detergent Digestion Gluc
    10. 10. Mass Spectroscopy RP C18Purification MS Chromatography
    11. 11. RNA sequence RNA library Extraction Quantification Enrichment preparation Addition of Blunt ends RNA fragments Fragmentationdeoxyadenosine conversion copied into DNA Ligation of Amplify Sequencingforked adaptors
    12. 12. Data AvailabilityGene and Transcript Quantification Data analysis
    13. 13. RESULTS AND DISCUSSION
    14. 14. Sample: The HeLa cells• HeLa cells – Human cervical carcinoma cell line • “Immortal cells”: can grow indefinitely, be frozen for decades • Standardized field of tissue culture – Named after Henriette Lacks by a scientist at John Hopkins Hospital • A piece of her tumor was taken • Her cells never died – Prolific growth maybe due to HPV18 • HPV18 viral proteins (E6 and E7) suppresses p53 and pRb gene products, respectively.
    15. 15. Proteome Coverage Study• Objective: to achieve maximum proteome coverage at a reasonable measurement time – Procedure: Investigate effects of protein fractionation, proteolytic digestion, peptide fractionation, and reverse phase chromatography
    16. 16. Proteome Coverage Study• Protein fractionation – Gel filtration: separation based on size and shape• Proteolytic digestion – Trypsin: C-terminal side of lysine and arginine – Glu-C: C-terminal of glutamic residues – Lys-C: carboxyl side of lysine residues – Note: Protein digestion heavily affects effective protein characterization and identification by mass spectrometry • Overlapping fragments = larger sequence coverage
    17. 17. Proteome Coverage Study• Pipette-based prefractionations – Strong anion exchange resin • Independent of pH • Mostly used for deep coverage of the composition of the sample or if specific peptides should be enriched• Reverse phase chromatography – Reduced the complexity of the peptide mixture by selecting peptides for tandem mass spectrometry according to their polarity
    18. 18. Proteome Coverage Study• LC-MS/MS analysis – Peptide MS spectra • Interpretation by comparison with lists from generated from theoretical digestion of protein – Fragment MS/MS spectra • Interpretation by comparison from theoretical fragmentation of peptide – Elution time of peptide is based on its polarity – Repeated extensively, in order to increase the number of peptides, thereby making the protein less complex, for which tandem mass spectra are acquired
    19. 19. Proteome Coverage Study• Procedure is referred as “shotgun” proteomics – Most successful strategy to achieve extensive proteome coverafe – Summary: protein sample is extracted from their biological source, subjected to enzymatic digestion, the resulting peptide mixtures are analysed by LC-MS/MS • Additional augmented fractionation steps for proteins/peptides can also be conducted
    20. 20. MaxQuant Computational Proteomics Environment• MaxQuant – Quantitative proteomics software package designed for analyzing large mass spectrometric data sets – Has an integrated search engine, Andromeda – Supported instrument: LTQ-Orbitrap • Orbitrap: ions circulate around a central, spindle- shaped electrode • Highly accurate: axial frequency oscillation, determined with high precision, is proportional to the square root of m/z.
    21. 21. MaxQuant Computational Proteomics Environment• Number of runs: 2 337 336 high resolution fragmentation spectra and high-accuracy precursor masses• Search Engine: Andromeda – Algorithm: uses a probability based approach to match tandem mass (MS/MS) spectra to peptide sequences in databases – Median peptide score: 121, 6% below has a score of 60 • For each score, corresponds to the sum of the highest ions score for each distinct sequence
    22. 22. MaxQuant Computational Proteomics Environment• Average identification of fragmentation spectra: 43%• Average absolute mass deviation of the precursors for the matched fragment masses: 1.2 and 4.8 p.p.m
    23. 23. MaxQuant Computational Proteomics Environment• Result of analysis – Identified and quantified number of peptides • 163 784 peptides – FDR (false discovery rate): 1% – Out of 163 784 peptides • 84 051 from tryptic digestion • 52 108 from Lys-C digestion • 44 704 from Glu-C digestion
    24. 24. MaxQuant Computational Proteomics Environment• Result of analysis – From obtained data, MaxQuant identified 10 255 proteins with 99% confidence • Lower bound of the number of proteins expressed in HeLa cells – There were observed overlapping fragments of enzymatic cleavage • Tryptic digestion: yielded highest number of identifications • Lys-C digewstion: 85% overlapped with Trypsin • Glu-C digestion: 5.2% novel identifications • Shows that <5% of all proteins were only identified by one peptide • Taken all together, >25% median sequence coverage
    25. 25. ENSEMBL database and GENSCAN predictions• ENSEMBL-annotated human protein-coding genes – MS/MS spectra: searched against the ENSEMBL database with GENSCAN predictions – 10 255 proteins were mapped to 9207 human protein-coding genes • Most identified number of genes at chromosome 1 • Least number of identified genes at chromosome 21 – GENSCAN preidictions: >1900 peptides not known to ENSEMBL genes
    26. 26. ZAB
    27. 27. Completeness of Detected Proteome• Inspecting the macrocomplexes which are functionally necessary• Proteosome, spliceosome, histone modifying complexes and respiratory chain complexes• Corum protein complex database
    28. 28. Corum protein complex database• Collection of experimentally verified mammalian protein complexes – protein complex function – Localization – subunit composition – literature references
    29. 29. • Mean proteome coverage of all Corum protein was >95%• Transcriptome coverage 96.5%• Among the lower coverage which is due to cell type specificity are (next slide)
    30. 30. Complex Normally Expressed % Coverage Sarcoglycan- Muscle 20 sarcospan SNARE (Soluble N- ethylmaleimide sensitive Neuronal tissue 40factor Attachment protein Receptor ) ITGA2b-ITGB3 Platelets 50• Sarcoglycan-sarcospan provides structural integrity in muscle tissues• SNARE for neurotransmitter release in synapses• ITGA2b-ITGB3 - a fibronectin receptor that plays a crucial role in coagulation
    31. 31. • 5% of the HeLa cell population was in mitosis – 61/63 proteins in a reference set of cell cycle- specific proteins – High coverage of the most metabolic pathways pertaining to basic cellular function• Comprehensiveness of the proteome is hard to determine by comparison with pathway databases because they contain cell type- specific proteins
    32. 32. Quantitative Analysis• Deep-sequencing transcriptomics – Proteomics data - >90% complete• Transciptome + proteome data – 10,000 - 12, 000 genes expressed in HeLa cells• iBAQ (intensity based absolute quantification) – incorporating individual peptide signals in MS and normalized by the number of observable peptides of the protein – Estimate the absolute amount of each protein
    33. 33. Quantitative Analysis • 40 most abundant protein comprised 25% of the proteome – Filamin A, pyruvate kinase, enolase, vimentin, Hsp 60 • 600 proteins-> 75% of the HeLa cell proteome mass
    34. 34. Quantitative Analysis• Contribution of each protein to the total mass in combination with the knowledge of number of cells in the initial sample – roughly estimate the absolute copy number of the proteins in HeLa cells
    35. 35. Quantitative Analysis• Ranked distribution of proteins – 90% protein is within a range of a factor of 60 above or below the median protein copy number of 18, 000 molecules per cel – The lower half accounts for <2% of its total mass
    36. 36. Quantitative Analysis• Protein abundance values – Used to estimate the proportional contribution of any: • individual protein, • protein complex and • protein class to the total proteome
    37. 37. Quantitative Analysis• Ribosomes (encoded by only 1% human gene) – 195 proteins contributed 6% to total protein mass• Actin cytoskeleton contributes four-fold more to the proteome mass than expected from the number of genes and proteins
    38. 38. Quantitative Analysis• Integral membrane – 25% of the genome – 7.6% protein mass
    39. 39. Quantitative Analysis • Protein folding – 2% of the identified proteome by number – 8% of proteome mass
    40. 40. Quantitative Analysis Percentage to the Total Mass “Protein folding” Integral membrane proteins Proteins Human genome 25 2 Protein Mass 7.6 8• Differences are due to cell-type specific functions of these proteins
    41. 41. Structural proteinsand proteins in basicmachineries > Regulatory proteins
    42. 42. Ribosome proteins form tight clusterat the top end
    43. 43. Proteosome also abundant but not itsregulatory subunits (factor of 100 less)
    44. 44. Cytoskeletal and metabolic proteinsextend over a broad range
    45. 45. Enolase – highest expression valueGlycogen phosphorylase – 100,000-fold less at protein level and 10,000less at transcript level
    46. 46. Regulatory proteins such as proteinkinases and transcription factors have,on average, lower expression than thestructural proteinsEach category spans a largeexpression rangeExpression levels can provide startingpoints for systems biologicalmodeling of the cell
    47. 47. TRANSCIPTOME  RNA-SeqPROTEOME  High-res MS“Given the rapid technological progress in both fields, we predict that the requireddepth of 10,000–12,000 genes will be routinely reachable soon.”
    48. 48. THANK YOU

    ×