Ce—M—M—
 Research Center for Molecular Medicine
 of the Austrian Academy of Sciences



         The isobar R package:
Analysis of quantitative proteomics data

                     F. Breitwieser   J. Colinge
           Bioinformatics Open Source Conference, 2011


                                                         1 / 10
isobar for Analysis of Quantitative Proteomics Data
Ce—M—M—              F. Breitwieser & J. Colinge
                                  Journal of Proteome Research | 3b2 | ver.9 | 6/5/011 | 12:56 | Msc: pr-2010-012784 | TEID: sbh00 | BATID: 00000 | Pages: 8.99




                                                                                                                                                                       ARTICLE

                                                                                                                                                                  pubs.acs.org/jpr




       1      General Statistical Modeling of Data from Protein Relative
       2      Expression Isobaric Tags
       3      Florian P. Breitwieser,† Andr M€ller,† Loïc Dayon,‡ Thomas K€cher,z Alexandre Hainard,‡ Peter Pichler,§
                                           e u                                o
       4      Ursula Schmidt-Erfurth,|| Giulio Superti-Furga,† Jean-Charles Sanchez,‡ Karl Mechtler,z Keiryn L. Bennett,†
       5      and Jacques Colinge*,†
              †
       6       CeMM, Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
              ‡
       7       Biomedical Proteomics Group, Department of Structural Biology and Bioinformatics, Faculty of Medicine, University of Geneva,
       8       Geneva, Switzerland
   ■   9    Mass Spectrometers to identify and quantify proteins
              z
                Institute of Molecular Pathology, Vienna, Austria
              §
       10       CD Laboratory for Proteome Analysis, University of Vienna, 1030 Vienna, Austria
   ■        isobar: R package for handling isobarically tagged data
             )




       11      Department of Ophtalmology, Medical University of Vienna, Vienna, Austria
       12   □b Supporting Information
             S
                   analyze and visualize protein expression changes
       13          interactive within R
            □ ABSTRACT: Quantitative comparison of the protein content of biological
       14         samples is a fundamental tool of research. The TMT and iTRAQ isobaric
            □     labeling technologies allow the comparison of 2, 4, 6, or LT X) in one Excel reports
                   scripts to generate PDF (via Asamples and
                  mass spectrometric analysis. Sound statistical models that E with the
       15                                                                   8
       16                                                                     scale
                  most advanced mass spectrometry (MS) instruments are essential for their
   ■ 17
       18   http://bioinformatics.cemm.oeaw.ac.at/isobar
               efficient use. Through the application of robust statistical methods, we
       19         developed models that capture variability from individual spectra to
       20         biological samples. Classical experimental designs with a distinct sample
       21         in each channel as well as the use of replicates in multiple channels are
       22         integrated into a single statistical framework. We have prepared complex
       23         test samples including controlled ratios ranging from 100:1 to 1:100 to                                                                                            2 / 10
Quantitative Proteomics via Mass Spectrometry
Ce—M—M—        F. Breitwieser  J. Colinge




   ■   peptide fragmentation spectrum for identification
   ■   isobaric peptide tags for quantification
          □   up to 8 different samples
   ■   isobar package
          □   extracts identification from Mascot/Phenyx results
          □   extracts quantitative information from spectrum
          □   groups proteins to have reporters with specific peptides
                                                                         3 / 10
Modelling Technical Variability on a Spectrum Level
Ce—M—M—   F. Breitwieser  J. Colinge




                                        ■   correct for isotope impurities
                                        ■   normalize
                                        ■   handle technical variability
                                            □   depends on signal intensity
                                            □   using noise model




   ib - correctIsotopeImpurities (ib)
   ib - normalize (ib)
   nm - NoiseModel (ib)
   maplot (ib , channel1 =114,channel2 =115,noise.model =nm)

                                                                              4 / 10
Modelling Technical Variability on a Spectrum Level
Ce—M—M—   F. Breitwieser  J. Colinge




                                        ■   correct for isotope impurities ✓
                                        ■   normalize
                                        ■   handle technical variability
                                            □   depends on signal intensity
                                            □   using noise model




   ib - correctIsotopeImpurities (ib)
   ib - normalize (ib)
   nm - NoiseModel (ib)
   maplot (ib , channel1 =114,channel2 =115,noise.model =nm)

                                                                              4 / 10
Modelling Technical Variability on a Spectrum Level
Ce—M—M—   F. Breitwieser  J. Colinge




                                        ■   correct for isotope impurities ✓
                                        ■   normalize ✓
                                        ■   handle technical variability
                                            □   depends on signal intensity
                                            □   using noise model




   ib - correctIsotopeImpurities (ib)
   ib - normalize (ib)
   nm - NoiseModel (ib)
   maplot (ib , channel1 =114,channel2 =115,noise.model =nm)

                                                                              4 / 10
Modelling Technical Variability on a Spectrum Level
Ce—M—M—   F. Breitwieser  J. Colinge




                                        ■   correct for isotope impurities ✓
                                        ■   normalize ✓
                                        ■   handle technical variability
                                            □   depends on signal intensity
                                            □   using noise model ✓




   ib - correctIsotopeImpurities (ib)
   ib - normalize (ib)
   nm - NoiseModel (ib)
   maplot (ib , channel1 =114,channel2 =115,noise.model =nm)

                                                                              4 / 10
Differential Protein Expression
Ce—M—M—           F. Breitwieser  J. Colinge

                                    CERU_RAT

                                                                            ■   spectra → peptides → protein
          20




                                                              q   115/114
                                                                  116/114
                                                                  117/114
                                                                            ■   summarize ratio with a weighted
          10




                                                                                mean
                         q

                                                                                □   relative to spectrum intensity
 ratio

          5




                         q

                         qq q
                     q     q
                    q    qq q q
                          q
                     qq q q q qq
                          q q q
                               q
                                       q
                                       q
                                          q   q
                                                    q
                                                     qq
                                                                            ■   calculate significance after
                        qq qq q q q q q qqq
                        qqq        q     q q q          q q
                     q q qq qq qqqq q qq q q q qq q q q
                          qq        qq q q q
                     q qqqq qq qqq q q q
                                     q                q
                                                                                assessing biological variability
          2




                             q     qq q q q q q q q q q q
                                    qq q q
                                      q q               q
                        q qq q
                        qqq q q q q
                               qq q
                               q q                q      q
                            q q qq                qq
                                q qq qq q q q q qqqq q
                      qq q q
                           qq               qq    q
                                   q q
                   qq q qqq qqq q qqq
                                 q qq q             q qq
                        q q qq q
                    q qq q q q q
                         q qq
                                 q
                                            q
                                               q
                                                 q
                                   q
                    qq qq q qqqq
                    q
                   qq
                       q
                       qq
                                qq
                              q qq
                                 q
                                                                            ■   compute ratios between classes
          1




                   q q q   q
                              q

               5e+02
                       q
                         q
                          qq
                          5e+03          5e+04          5e+05       5e+06
                                                                                □   Healthy versus Diseased
                                  average intensity


         estimateRatio (ib , noise . model .hcd ,114,116,ceru.rat)
         proteinRatios (ib ,cl=c(H,H,D,D),
                          summarize =TRUE , method = interclass )
         maplot2 (ib , relative .to=114,ceru.rat ,main=CERU_RAT)
                                                                                                                     5 / 10
Deciding for significant regulation
Ce—M—M—       F. Breitwieser  J. Colinge




                                                 ■   ’Volcanoe plot’
                                                       □                      fold change versus p-value
  ■   Biological variability
        □   can be learned from replicates




                                                                              60
                                                                              50
                                                     − log10 signal p−value

                                                                              40
                                                                              30
                 −1        0         1




                                                                              20
                                                                                                                          q

                                                                                          qqq
                                                                                                                         q q
                                                                                          q                               qq




                                                                              10
                                                                                                                         qq q
                                                                                                                          q q
                                                                                             q                          qq q
                                                                                                                          q
                                                                                                                          q q
                                                                                                                          q
                                                                                           q qq                        qqq q
                                                                                                                        qqq q
                                                                                                                       qq q
                                                                                                                    qqqqqq
                                                                                         q qq qqq                      q qqq
                                                                                                                       qq qq
                                                                                                                         q
                                                                                                                         q
                                                                                                                     qqq qq
                                                                                                                       qqq
                                                                                                                    qqqqqq
                                                                                          q
                                                                                         qqqqqqq
                                                                                               qq
                                                                                          qqqqqqqq q
                                                                                          q qqqqqqq
                                                                                            qqqq qq
                                                                                               q
                                                                                              qqq q q               qqqqqqq
                                                                                                                     q qq qq
                                                                                                                      qqq q
                                                                                                                     qqqqqq
                                                                                                                        qq q
                                                                                                                   qqqqqqq
                                                                                                                   qqqqqqq
                                                                                                                     qqq q
                                                                                                                    qqqqqqq
                                                                                                                   qqqqqqq
                                                                                                                        qq q
                                                                                                                     qqqq q
                                                                                                                        qqqq
                                                                                          q qqqq qq q
                                                                                          q qq q q               qqqqq
                                                                                                                qqqq q
                                                                                                              q q qqqqqqq
                                                                                                                    qq
                                                                                                                 qqqq
   h1              h2               d1      d2
                                                                                          q qq qqq qqq
                                                                                                qqqq
                                                                                                  qq
                                                                                         q qqqqqqqqq q
                                                                                           q qq qqq q q         q qq q
                                                                                                              qqqqqqqqq
                                                                                                              qqqqqqq
                                                                                        q q qqqqqqqqqqqq qqqqqqqqqqqqqq q
                                                                                                   qq q
                                                                                                   q
                                                                                             qq qqqqq q
                                                                                                   q q       qqqqqqqqqqq
                                                                                                            qqqqqqqqqq
                                                                                                   qq q q qqqqqqqqqqqqq qq
                                                                                              qq qqqqq q     qqqqqqqq
                                                                                                          qqqqqqqqqq q q
                                                                                                           qqqqqqqqqqq
                                                                                                               q qq   q




                                                                              0
                                                                                   −4           −2       0          2           4

                                                                                           − log10 sample p−value




                                                                                                                                    6 / 10
Automating the Analysis - PDF Report
Ce—M—M—        F. Breitwieser  J. Colinge

                                                                                              -5          1       5
                    ch1    ch2   protein             group   peptides   spectra      ratio                .

                1     C     T    Serpina1e: Q00898     1/1         7            1    0.22                .           
                2     C     T    Acaca: Q5SWU91,2      2/2         5            4    0.40                .           
                3     C     T    Atp5j: P97450         1/1         4        19       0.49                .           
                .
                .            .
                             .                           .
                                                         .         .
                                                                   .                     .
                                                                                         .
                .            .                           .         .                     .
                                 Hist1h3a: P68433,
              130     C     T                          2/3         8            2    2.42                .           
                                 Hist1h3c: P84228
              131     C     T    Postn: Q620091−5      5/5         1            3    3.05                .           
              132     C     T    Myh7: Q91Z83          1/1       128        62       3.66                .           



   ■   via Sweave: R code within LTEX
                                 A
          □   reproducible research                                      Proteins
                                                                          pos       accession       gene name   protein name
   ■   sections                                                             1       P68433          Hist1h3a    Histone H3.1
                                                                            1       P84228          Hist1h3c    Histone H3.2
          □   Significantly regulated proteins                              2       P84244          H3f3b       Histone H3.3
          □   All protein ratios                                         Peptides
          □   Protein grouping                                               rs gs us              peptides
                                                                          1 1       7   0
   ■   not shown: QC report, Excel report                                 2 0       7   0



   Sweave (isobar - analysis .Rnw) # generate report using Sweave
                                                                                                                           7 / 10
Acknowledgments
Ce—M—M—         F. Breitwieser  J. Colinge


   ■   Research Center for Molecular Medicine, Vienna
          □   Jacques Colinge
          □   Keiryn Bennett’s Masspec group
          □   Giulio Superti-Furga
          □   Bioinformatics group
               ■      Alexey Stukalov
               ■      Gerhard Duernberger
               ■      Patrick Meidl


   ■      .. isobar    Collaborators
          □   University of Geneva: Jean-Charles Sanchez
          □   IMP, Vienna: Peter Pichler and Karl Mechtler
   ■   Open Source Software Developers
          □   Richard Stallman, Linus Torvalds, Robert Gentleman, . . .
          □   Donald Knuth, Hadley Wickham, Till Tantau, . . .
                                                                          8 / 10
Appendix: Quality Control Report
Ce—M—M—    F. Breitwieser  J. Colinge


                                 tag 116: m/z 116.11           tag 117: m/z 117.11
                   500

                   400
           count



                   300

                   200

                   100

                     0
                              −1 −5 0e 5e 1e                  −1 −5 0e 5e 1e
                                e− e− +0 −0 −0                  e− e− +0 −0 −0
                                  03 04 0  4   3                  03 04 0  4   3
                                                       mass

   ■   shows reporter mass precision and biological variability
   reporterMassPrecision (ib)
   Sweave (isobar -qc.Rnw)


                                                                                     9 / 10
Appendix: Protein Identification using Mass Spectrometer
Ce—M—M—   F. Breitwieser  J. Colinge




                                                  10 / 10

Bosc2011 isobar-fbp

  • 1.
    Ce—M—M— Research Centerfor Molecular Medicine of the Austrian Academy of Sciences The isobar R package: Analysis of quantitative proteomics data F. Breitwieser J. Colinge Bioinformatics Open Source Conference, 2011 1 / 10
  • 2.
    isobar for Analysisof Quantitative Proteomics Data Ce—M—M— F. Breitwieser & J. Colinge Journal of Proteome Research | 3b2 | ver.9 | 6/5/011 | 12:56 | Msc: pr-2010-012784 | TEID: sbh00 | BATID: 00000 | Pages: 8.99 ARTICLE pubs.acs.org/jpr 1 General Statistical Modeling of Data from Protein Relative 2 Expression Isobaric Tags 3 Florian P. Breitwieser,† Andr M€ller,† Loïc Dayon,‡ Thomas K€cher,z Alexandre Hainard,‡ Peter Pichler,§ e u o 4 Ursula Schmidt-Erfurth,|| Giulio Superti-Furga,† Jean-Charles Sanchez,‡ Karl Mechtler,z Keiryn L. Bennett,† 5 and Jacques Colinge*,† † 6 CeMM, Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria ‡ 7 Biomedical Proteomics Group, Department of Structural Biology and Bioinformatics, Faculty of Medicine, University of Geneva, 8 Geneva, Switzerland ■ 9 Mass Spectrometers to identify and quantify proteins z Institute of Molecular Pathology, Vienna, Austria § 10 CD Laboratory for Proteome Analysis, University of Vienna, 1030 Vienna, Austria ■ isobar: R package for handling isobarically tagged data ) 11 Department of Ophtalmology, Medical University of Vienna, Vienna, Austria 12 □b Supporting Information S analyze and visualize protein expression changes 13 interactive within R □ ABSTRACT: Quantitative comparison of the protein content of biological 14 samples is a fundamental tool of research. The TMT and iTRAQ isobaric □ labeling technologies allow the comparison of 2, 4, 6, or LT X) in one Excel reports scripts to generate PDF (via Asamples and mass spectrometric analysis. Sound statistical models that E with the 15 8 16 scale most advanced mass spectrometry (MS) instruments are essential for their ■ 17 18 http://bioinformatics.cemm.oeaw.ac.at/isobar efficient use. Through the application of robust statistical methods, we 19 developed models that capture variability from individual spectra to 20 biological samples. Classical experimental designs with a distinct sample 21 in each channel as well as the use of replicates in multiple channels are 22 integrated into a single statistical framework. We have prepared complex 23 test samples including controlled ratios ranging from 100:1 to 1:100 to 2 / 10
  • 3.
    Quantitative Proteomics viaMass Spectrometry Ce—M—M— F. Breitwieser J. Colinge ■ peptide fragmentation spectrum for identification ■ isobaric peptide tags for quantification □ up to 8 different samples ■ isobar package □ extracts identification from Mascot/Phenyx results □ extracts quantitative information from spectrum □ groups proteins to have reporters with specific peptides 3 / 10
  • 4.
    Modelling Technical Variabilityon a Spectrum Level Ce—M—M— F. Breitwieser J. Colinge ■ correct for isotope impurities ■ normalize ■ handle technical variability □ depends on signal intensity □ using noise model ib - correctIsotopeImpurities (ib) ib - normalize (ib) nm - NoiseModel (ib) maplot (ib , channel1 =114,channel2 =115,noise.model =nm) 4 / 10
  • 5.
    Modelling Technical Variabilityon a Spectrum Level Ce—M—M— F. Breitwieser J. Colinge ■ correct for isotope impurities ✓ ■ normalize ■ handle technical variability □ depends on signal intensity □ using noise model ib - correctIsotopeImpurities (ib) ib - normalize (ib) nm - NoiseModel (ib) maplot (ib , channel1 =114,channel2 =115,noise.model =nm) 4 / 10
  • 6.
    Modelling Technical Variabilityon a Spectrum Level Ce—M—M— F. Breitwieser J. Colinge ■ correct for isotope impurities ✓ ■ normalize ✓ ■ handle technical variability □ depends on signal intensity □ using noise model ib - correctIsotopeImpurities (ib) ib - normalize (ib) nm - NoiseModel (ib) maplot (ib , channel1 =114,channel2 =115,noise.model =nm) 4 / 10
  • 7.
    Modelling Technical Variabilityon a Spectrum Level Ce—M—M— F. Breitwieser J. Colinge ■ correct for isotope impurities ✓ ■ normalize ✓ ■ handle technical variability □ depends on signal intensity □ using noise model ✓ ib - correctIsotopeImpurities (ib) ib - normalize (ib) nm - NoiseModel (ib) maplot (ib , channel1 =114,channel2 =115,noise.model =nm) 4 / 10
  • 8.
    Differential Protein Expression Ce—M—M— F. Breitwieser J. Colinge CERU_RAT ■ spectra → peptides → protein 20 q 115/114 116/114 117/114 ■ summarize ratio with a weighted 10 mean q □ relative to spectrum intensity ratio 5 q qq q q q q qq q q q qq q q q qq q q q q q q q q q qq ■ calculate significance after qq qq q q q q q qqq qqq q q q q q q q q qq qq qqqq q qq q q q qq q q q qq qq q q q q qqqq qq qqq q q q q q assessing biological variability 2 q qq q q q q q q q q q q qq q q q q q q qq q qqq q q q q qq q q q q q q q qq qq q qq qq q q q q qqqq q qq q q qq qq q q q qq q qqq qqq q qqq q qq q q qq q q qq q q qq q q q q q qq q q q q q qq qq q qqqq q qq q qq qq q qq q ■ compute ratios between classes 1 q q q q q 5e+02 q q qq 5e+03 5e+04 5e+05 5e+06 □ Healthy versus Diseased average intensity estimateRatio (ib , noise . model .hcd ,114,116,ceru.rat) proteinRatios (ib ,cl=c(H,H,D,D), summarize =TRUE , method = interclass ) maplot2 (ib , relative .to=114,ceru.rat ,main=CERU_RAT) 5 / 10
  • 9.
    Deciding for significantregulation Ce—M—M— F. Breitwieser J. Colinge ■ ’Volcanoe plot’ □ fold change versus p-value ■ Biological variability □ can be learned from replicates 60 50 − log10 signal p−value 40 30 −1 0 1 20 q qqq q q q qq 10 qq q q q q qq q q q q q q qq qqq q qqq q qq q qqqqqq q qq qqq q qqq qq qq q q qqq qq qqq qqqqqq q qqqqqqq qq qqqqqqqq q q qqqqqqq qqqq qq q qqq q q qqqqqqq q qq qq qqq q qqqqqq qq q qqqqqqq qqqqqqq qqq q qqqqqqq qqqqqqq qq q qqqq q qqqq q qqqq qq q q qq q q qqqqq qqqq q q q qqqqqqq qq qqqq h1 h2 d1 d2 q qq qqq qqq qqqq qq q qqqqqqqqq q q qq qqq q q q qq q qqqqqqqqq qqqqqqq q q qqqqqqqqqqqq qqqqqqqqqqqqqq q qq q q qq qqqqq q q q qqqqqqqqqqq qqqqqqqqqq qq q q qqqqqqqqqqqqq qq qq qqqqq q qqqqqqqq qqqqqqqqqq q q qqqqqqqqqqq q qq q 0 −4 −2 0 2 4 − log10 sample p−value 6 / 10
  • 10.
    Automating the Analysis- PDF Report Ce—M—M— F. Breitwieser J. Colinge -5 1 5 ch1 ch2 protein group peptides spectra ratio . 1 C T Serpina1e: Q00898 1/1 7 1 0.22 . 2 C T Acaca: Q5SWU91,2 2/2 5 4 0.40 . 3 C T Atp5j: P97450 1/1 4 19 0.49 . . . . . . . . . . . . . . . . Hist1h3a: P68433, 130 C T 2/3 8 2 2.42 . Hist1h3c: P84228 131 C T Postn: Q620091−5 5/5 1 3 3.05 . 132 C T Myh7: Q91Z83 1/1 128 62 3.66 . ■ via Sweave: R code within LTEX A □ reproducible research Proteins pos accession gene name protein name ■ sections 1 P68433 Hist1h3a Histone H3.1 1 P84228 Hist1h3c Histone H3.2 □ Significantly regulated proteins 2 P84244 H3f3b Histone H3.3 □ All protein ratios Peptides □ Protein grouping rs gs us peptides 1 1 7 0 ■ not shown: QC report, Excel report 2 0 7 0 Sweave (isobar - analysis .Rnw) # generate report using Sweave 7 / 10
  • 11.
    Acknowledgments Ce—M—M— F. Breitwieser J. Colinge ■ Research Center for Molecular Medicine, Vienna □ Jacques Colinge □ Keiryn Bennett’s Masspec group □ Giulio Superti-Furga □ Bioinformatics group ■ Alexey Stukalov ■ Gerhard Duernberger ■ Patrick Meidl ■ .. isobar Collaborators □ University of Geneva: Jean-Charles Sanchez □ IMP, Vienna: Peter Pichler and Karl Mechtler ■ Open Source Software Developers □ Richard Stallman, Linus Torvalds, Robert Gentleman, . . . □ Donald Knuth, Hadley Wickham, Till Tantau, . . . 8 / 10
  • 12.
    Appendix: Quality ControlReport Ce—M—M— F. Breitwieser J. Colinge tag 116: m/z 116.11 tag 117: m/z 117.11 500 400 count 300 200 100 0 −1 −5 0e 5e 1e −1 −5 0e 5e 1e e− e− +0 −0 −0 e− e− +0 −0 −0 03 04 0 4 3 03 04 0 4 3 mass ■ shows reporter mass precision and biological variability reporterMassPrecision (ib) Sweave (isobar -qc.Rnw) 9 / 10
  • 13.
    Appendix: Protein Identificationusing Mass Spectrometer Ce—M—M— F. Breitwieser J. Colinge 10 / 10