SlideShare a Scribd company logo
1 of 24
Download to read offline
Tag-based transcript sequencing:
          Comparison of SAGE and CAGE

                                              Matthias Harbers




          European Meeting on
          Next Generation Sequencing
          August 29 to September 1, 2010
          Leiden University Medical Center, The Netherlands

Matthias Harbers                                                 1
Focusing on transcriptome analysis:
               Transcript Start Site
                                                                                      Nucleus
        Promoter                       “Gene”



                                                                                   Genomic DNA
                                                                              (storage of information)
    Transcription Factors



                                                    Transcription by RNA polymerase II


                                                                   AAAAA          Processed mRNA
  (7-methylguanosine cap)        Cap
                                                                              (transport of information)

                                                    Translation at ribosome




          Non-coding RNAs                                            Protein
    (mostly regulatory functions ?)                      (tools to operate “functions”)

               Cytoplasm



 Transcript information is the basis to understanding genomes, proteins, and non-coding RNAs!

 Matthias Harbers                                                                                          2
cDNA cloning and sequencing:

                                                   Genome
          1a         1b   2     3      4   5


                                                AAAAAA
                                                   mRNA
                                                AAAAAA                                    G
                                                                                                 Databases
                              AAAAAA                                   Clone resources           Databases
                                                   mRNA pool              Clone resources
                                    AAAAAA                     Great asset foreach gene)
                                                                (Representative clone for
                                                                                          research community!
                                      AAAAAA
                                                                   Functional analysis of genes.
               cDNA Library Preparation
                                       AAAAAA



                                       AAAAAA



                                       AAAAAA

                                                                                                High throughput
                                                                                                  Sequencing
                                       AAAAAA




                                                cDNA library
                                                                       Random clone picking

 Matthias Harbers                                                                                                 3
What did we learn from cDNA cloning projects?

 End-sequencing of cDNA clones 1st approach to transcript discovery

 Great improvements in full-length cDNA cloning

 Building of large cDNA collections (FANTOM, MGC, others…)

 Limited by throughput of capillary sequencing
   (RIKEN FANTOM Pipeline: ~40,000 reads per day)




 Limited by high cost of capillary sequencing
   (Reagent cost only per read in the US$ 1 to 1.5 range)




 Hence, cDNA cloning and sequencing did cover entire complexity of
  transcriptomes

 Other methods needed to uncover complexity of transcriptome

 Matthias Harbers                                                 4
Tag-based methods for high-throughput sequencing

  Short sequences (“tags”) are sufficient for transcript identification

  Short sequencing reads reduce cost

  Short sequencing reads increase throughput

  Protocol should provide 1 tag per transcript

  Digital expression profiling by counting “tags”

  Unbiased transcript discovery

  Transcript annotation using reference data

  Serial Analysis Gene Expression (SAGE)
      (Velculescu VE, Zhang L, Vogelstein B, Kinzler KW. (1995). "Serial analysis of gene expression". Science 270 (5235): 484–7.)


 Matthias Harbers                                                                                                                    5
Preparation of SAGE libraries
                                        AAAAAA                               Full-length mRNA 
                                                                             Non-polyadenylated mRNA No!
        mRNA pool                             AAAAAA                         Truncated mRNA 
                                                AAAAAA                       Full-length mRNA 
                                                                             cDNA synthesis
                                                                             Attach cDNA to surface
                        AAAAAA                                     AAAAAA
                        TTTTTT                                     TTTTTT
                                                                             Cut with anchoring enzyme
                                                                             (frequent cutters, commonly NlaIII)
                        AAAAAA                                     AAAAAA
                GTAC    TTTTTT                              GTAC   TTTTTT

                                                                             Linker ligation to open Nla III site
                CATG    AAAAAA                              CATG   AAAAAA
         A      GTAC    TTTTTT                      B       GTAC   TTTTTT

                                                                             Cut with tagging enzyme
                CATG                                        CATG             (LongSAGE: MmeI, SuperSAGE: EcoP15I)
         A      GTAC                                B       GTAC             Release from surface


                                                        B                    Ligation to form “Ditag”
                               CATG
                                             GTAC
                       A       GTAC          CATG

                                                                             PCR amplification
                                                                             Cut with anchoring enzyme
                                             CATG
                               GTAC                                          Concatenation/Cloning

                                                                                          Digital Gene Expression (DGE)

                    …   CATG
                        GTAC
                                      CATG
                                      GTAC
                                                    GTAC
                                                    CATG …                                         L         CATG
                                                                                                             GTAC   R
                Sequencing concatemers by capillary sequencing              New protocols for direct sequencing on high-speed sequencers


 Matthias Harbers                                                                                                                   6
Why Cap Analysis Gene Expression (CAGE)?

  Sequencing of 5’ends allows discovery of Transcription Start Sites

  5’-end sequencing allows transcript identification

  5’-end sequencing allows promoter identification

  5’-end sequencing allows monitoring of non-polyadenylated mRNAs

  Cap-Trapper method very effective for 5’-end selection

  Cap-Trapper allows library preparation directly from total RNA

  Shift from “3’-end information” to “5’-end information”

  Cap Analysis Gene Expression (CAGE)
    (Shiraki T et al. Proc Natl Acad Sci U S A. 2003 Dec 23;100(26):15776-81. Epub 2003 Dec 8)


 Matthias Harbers                                                                                7
Flow of CAGE projects using high-speed sequencing
                 Gene Network Models
(genome-wide view on promoter activities)
                                                                                                                       CAGE Library Preparation
                                                                                                                          Barcode     CAGE Tag


                                                                                                      ©Illumina


                                                                Data processing       Illumina GAIIx Sequencer
                                                                                             (36 bp reads)
                                  Data visualization                                                                                   27bp tag



                Broad             Narrow                                                                                               Use of barcodes
                                          CAGE Clusters             (genome annotation)            Illumina
                                         mapped to genome                                          Sequences

                                                                                                   Streptavidin
 TF

      TF

           TF




                        TF
                             TF




                                                                                                                                            AAAAAA
Promoter 1a                         1b     2     3      4   5            Genome                    Biotin
                                                                                                                                       RNase treatment
                                                                                                   EcoP15I site
                                                                                                                                            AAAAAA
                                                                  AAAAAA                           Cap structure
                                                                           mRNA                    at 5’ end of mRNA
                                                                                                                                       Biotinylation of Cap
                                                                  AAAAAA
                                                                                                                                            AAAAAA


                                                                                                                                       Random priming
                                               AAAAAA              Full-length mRNA 
                                                                                                                                            AAAAAA
                                                                   Non-polyadenylated mRNA 
                                                     AAAAAA        Truncated mRNA No!
                                                       AAAAAA      Full-length mRNA 
                                                                                                                        Starting from total RNA
                                      mRNA pool


      Matthias Harbers                                                                                                                                   8
Comparison of SAGE and CAGE data
  Directly compare SAGE (DGE) and CAGE from same samples

  Use of proliferating and differentiated C2C12 myoblasts as a model
               Proliferating C2C12 cells                              Differentiated C2C12 cells:
                                                                      Fusion into myotubes




               (picture provided by Willem Hoogaars)                  (picture provided by Willem Hoogaars)


  Use of biological triplicates

  Use of Illumina Genome Analyzer for high-speed sequencing

  Jointly Leiden University, Genomatix, ServiceXS, DNAFORM
     (Hestand, MS et al., Nucleic Acids Res. 2010 Sep;38(16):e165])

 Matthias Harbers                                                                                             9
Flow SAGE and CAGE data analysis
  CAGE Prolif1-3
   CAGE Prolif1-3                         CAGE Diff1-3
                                           CAGE Diff1-3                         SAGE Prolif1-3
                                                                                 SAGE Prolif1-3                        SAGE Diff1-3
                                                                                                                        SAGE Diff1-3
    CAGE Prolif1-3                          CAGE Diff1-3                          SAGE Prolif1-3                         SAGE Diff1-3

                             Illumina sequencing (1 channel/sample, 2 technical replica) and data processing


                      CAGE: Remove 1 base                                                       SAGE: Add CATG


                Mapping 2 mismatches allowed                                             Mapping 1 mismatch allowed
                   CAGE: 742,355 regions                                                   SAGE: 361,655 regions


                     Set threshold to > 2 TPM                                               Set threshold to > 2 TPM
                      CAGE: 41,862 regions                                                   SAGE: 43,512 regions
 ElDorado mouse genome: 9,957 annotated exons
 ElDorado mouse genome: 27,190 partially annotated exons
 ElDorado mouse genome: 2,368 annotated introns
 ElDorado mouse genome: 2,347 intergenic regions
 Annotated TSS: 13,541 (32%)
 Annotated promoter regions: 6,331 (15%)
 Annotated 3’-end of transcripts: 8,028 (19%)
 FANTOM 3 CAGE data set: 31,680 (76%)


      Assigning CAGE regions to genes (1,000 bp window)                       Assigning SAGE regions to genes (1,000 bp window)
                    CAGE: 10,409 genes                                                       SAGE: 10,987 genes


 Matthias Harbers                                                                                                                  10
SAGE and CAGE data sets
 Sample                          No. seq reads    No. aligned     % aligned
 CAGE: Prolif-1                    4,886,341       2,086,233         42.7

 CAGE: Prolif-1 (re-sequenced)     3,933,233       1,770,247         45.0

 CAGE: Prolif-2                    5,003,964       2,421,443         48.4

 CAGE: Prolif-3                    4,734,605       2,062,081         43.6

 CAGE: Diff-1                      4,525,321       1,679,081         37.1

 CAGE: Diff-1 (re-sequenced)       3,101,153       1,252,451         40.4

 CAGE: Diff-2                      5,060,041       2,195,263         43.4

 CAGE: Diff-3                      4,830,194       1,578,087         32.7

 SAGE: Prolif-1                    5,941,753       3,351,426         56.4

 SAGE: Prolif-2                    7,768,787       4,464,057         57.5

 SAGE: Prolif-3                    6,723,476       3,878,953         57.7

 SAGE: Diff-1                      9,467,926       5,811,947         61.4

 SAGE: Diff-2                      7,269,002       4,618,715         63.5

 SAGE: Diff-3                      4,392,416       2,494,618         56.8

 CAGE average                     4.5 mil reads   1.9 mil reads   42% aligned

 SAGE average                     6.9 mil reads   4.1 mil reads   59% aligned


 Matthias Harbers                                                               11
MyoD (myogenic maker): Viewed in UCSC Genome Browser



                                                         Transcriptional
                                      “Exon Painting”
                                                        activity at 3’-end




                         Narrow CAGE peak
                    (MyoD promoter has TATA box)




                                                               SAGE
                                                               peak




 Matthias Harbers                                                  12
Reproducibility of SAGE and CAGE data

      P=0.981               P=0.963               P=0.771

                                                                            CAGE




       Sequencing replica    Biological replica   Differential expression
          (CAGE only)
                            P=0.930                P=0.839


                                                                            SAGE




 Matthias Harbers                                                           13
Correlation of SAGE and CAGE data

              CAGE: 10,409 genes               SAGE: 10,9879 genes

         Overlap all detectable genes   Overlap differentially expressed genes



            1169    9240     1747             2160      2144     1702

            CAGE             SAGE              CAGE              SAGE




 Matthias Harbers                                                            14
Top 30 genes from SAGE and CAGE expression data

CAGE gene                Ration                   Microarray             SAGE gene                   Ratio             Microarray
Hfe2                     4,073                    NA                     RP23-36P22.5                576               NA
Myom3                    1,624                    NA                     Neb                         525               NA
Lmod2                    1,305                    NA                     Mylpf                       504               Yes
Myh7                     1,124                    Yes                    Ttn                         380               NA
Mb                       908                      Yes                    Myh3                        368               Yes
RP23-36P22.5             735                      NA                     Xirp1                       306               Yes
Pygm                     717                      Yes                    1110002H13Rik               263               NA
Myl4                     614                      Yes                    Tnnc1                       232               Yes
Synpo21                  595                      NA                     Cav3                        150               Yes
Myh1                     561                      Yes                    Cbfa2t3                     133               Yes
……                                                                       ……
13 out of 30 not found by microarray                                     10 out of 30 not found by microarray

Microarray data from same cell line published by: Tomczak KK et al. FASEB J. 2004 Feb;18(2):403-5. Epub 2003 Dec 19.
(Affymetrix mouse MG_U74Av2 and MG_U74Cv2 oligonucleotide-based GeneChips)


 Matthias Harbers                                                                                                                   15
GO terms found in SAGE, CAGE and microarray data

            CAGE GO                            SAGE GO                        Microarray GO
Regulation of striated muscle      Regulation of muscle contraction   Cycline-dependent protein kinase
contraction                                                           inhibitor activity

Cardiac muscle contraction         Cardiac muscle contraction         Myogenesis

Myogenesis                         Myogenesis                         Skeletal muscle development

Regulation of muscle contraction   Regulation of striated muscle      Myoblast differentiation
                                   contraction

Skeletal muscle development        Skeletal muscle development        6-phosphofructokinase activity

Muscle development                 Myofibril assembly                 Muscle development

Striated muscle contraction        Muscle development                 Muscle cell differentiation

Myoblast differentiation           Myoblast fusion                    Tumor suppressor activity

Muscle cell differentiation        Striated muscle contraction        Myofibril assembly

Sarcomere organization             Muscle cell differentiation        Heart development

      10/10 muscle related               10/10 muscle related               7/10 muscle related

 Matthias Harbers                                                                                   16
Myl1 CAGE region as example for new TSS discovery


CAGE Diff1




CAGE Prolif1




 Mouse


 Human


 Horse
                                          Myosin light chain 1 (Myl1) promoter

  Matthias Harbers                                                     17
Differentially regulated TSS found in CAGE regions

   Found 196 new differentially regulated TSS in CAGE data

   Out of which 111 regions are upstream of known genes

   Out of which 85 regions are downstream of known genes

  (Lower Cp value = higher expression)




       +              -              +   +   +   +     +      +


   7 out of 8 regions tested could be confirmed by RT-PCR

 Matthias Harbers                                                 18
Discovery of novel TSS by CAGE




 Matthias Harbers                19
“Exon-painting” found in CAGE libraries

          CAGE tags with some frequency found in exons

          Exon-painting is reproducible and gene specific

          New re-capping activity suggested
Col1a1
Col1a2




         Matthias Harbers                                    20
New developments for CAGE method
   nanoCAGE: Preparation of CAGE libraries starting from
    as little as 50 ng total RNA
       (Plessy C et al. Nat Methods. 2010 Jul;7(7):528-34. Epub 2010 Jun 13.)



   CAGE-Scan: Use of paired-end sequencing to link new TSS
    to known genes
       (Plessy C et al. Nat Methods. 2010 Jul;7(7):528-34. Epub 2010 Jun 13.)


                     AAAAAA                         AAAAAA


                     AAAAAA                         AAAAAA

                                                                                1b   2   3   4   5
                     AAAAAA                         AAAAAA




   Helicos-CAGE: Use of single-molecule sequencing to reduce
    bias in CAGE libraries and reduced sample requirements

   Link high-speed sequencing to cDNA cloning: Creating the
    resources needed to study newly discovered transcripts!
 Matthias Harbers                                                                                    21
Summary


 SAGE and CAGE both provide highly reproducible data sets

 SAGE and CAGE data show great overlap on genes covered

 Both methods show better coverage than microarray data

 CAGE data more complex than SAGE: 67% of gene have multiple
  CAGE regions

 CAGE data allowed for discovery of new TSS

 CAGE data indicate transcriptional at 3’-ends of annotated genes

 CAGE data showed some exon-painting for many transcripts


 Matthias Harbers                                                    22
Acknowledgements

              Leiden University Medical Center:                      Genomatix:
              Matthew S. Hestand                                     Andreas Klinghoff
              Yavuz Ariyurek                                         Matthias Scherf
              Yolande Ramos                                          Thomas Werner
              Gert-Jan B. van Ommen
              Johan T. den Dunnen
              Peter A.C. ‘t Hoen


              DNAFORM:                                               Service XS:
              Makoto Suzuki                                          Wilbert van Workum
DNAFORM




                              Omics Science Center RIKEN Yokohama:
                              Piero Carninci
                              Charles Plessy
                              Yoshihida Hayashizaki


 Matthias Harbers                                                                        23
Thank you for your interest!




Matthias Harbers               24

More Related Content

What's hot

An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAGRF_Ltd
 
Protein-Protein Interactions (PPIs)
Protein-Protein Interactions (PPIs)Protein-Protein Interactions (PPIs)
Protein-Protein Interactions (PPIs)Sai Ram
 
Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS) Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS) Bharathiar university
 
Basic information of s1 nuclease
Basic information of s1 nucleaseBasic information of s1 nuclease
Basic information of s1 nucleaseDynah Perry
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicsAthira RG
 
THIRD GEN SEQUENCING.pptx
THIRD GEN SEQUENCING.pptxTHIRD GEN SEQUENCING.pptx
THIRD GEN SEQUENCING.pptxRITHIKA R S
 
Functional proteomics, and tools
Functional proteomics, and toolsFunctional proteomics, and tools
Functional proteomics, and toolsKAUSHAL SAHU
 
Rapid amplification of c-DNA ends
Rapid amplification of c-DNA endsRapid amplification of c-DNA ends
Rapid amplification of c-DNA endsLovnish Thakur
 
Study of Transcriptome
Study of TranscriptomeStudy of Transcriptome
Study of TranscriptomeBOTANYWith
 
SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionAashish Patel
 
SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)talhakhat
 
Lectut btn-202-ppt-l33. site-directed mutagenesis
Lectut btn-202-ppt-l33. site-directed mutagenesisLectut btn-202-ppt-l33. site-directed mutagenesis
Lectut btn-202-ppt-l33. site-directed mutagenesisRishabh Jain
 

What's hot (20)

Pyrosequencing
PyrosequencingPyrosequencing
Pyrosequencing
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
Protein-Protein Interactions (PPIs)
Protein-Protein Interactions (PPIs)Protein-Protein Interactions (PPIs)
Protein-Protein Interactions (PPIs)
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
 
Sequence assembly
Sequence assemblySequence assembly
Sequence assembly
 
Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS) Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS)
 
Basic information of s1 nuclease
Basic information of s1 nucleaseBasic information of s1 nuclease
Basic information of s1 nuclease
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
THIRD GEN SEQUENCING.pptx
THIRD GEN SEQUENCING.pptxTHIRD GEN SEQUENCING.pptx
THIRD GEN SEQUENCING.pptx
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Functional proteomics, and tools
Functional proteomics, and toolsFunctional proteomics, and tools
Functional proteomics, and tools
 
Analysis of gene expression
Analysis of gene expressionAnalysis of gene expression
Analysis of gene expression
 
Rapid amplification of c-DNA ends
Rapid amplification of c-DNA endsRapid amplification of c-DNA ends
Rapid amplification of c-DNA ends
 
222397 lecture 16 17
222397 lecture 16 17222397 lecture 16 17
222397 lecture 16 17
 
Ion Torrent Sequencing
Ion Torrent SequencingIon Torrent Sequencing
Ion Torrent Sequencing
 
Study of Transcriptome
Study of TranscriptomeStudy of Transcriptome
Study of Transcriptome
 
SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene Expression
 
SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)
 
Lectut btn-202-ppt-l33. site-directed mutagenesis
Lectut btn-202-ppt-l33. site-directed mutagenesisLectut btn-202-ppt-l33. site-directed mutagenesis
Lectut btn-202-ppt-l33. site-directed mutagenesis
 
Microarray Analysis
Microarray AnalysisMicroarray Analysis
Microarray Analysis
 

Similar to Tag-based transcript sequencing: Comparison of SAGE and CAGE

Genomics lecture 3
Genomics lecture 3Genomics lecture 3
Genomics lecture 3iainj88
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGScursoNGS
 
Sequencing the transcriptome reveals complex layers of regulation, Department...
Sequencing the transcriptome reveals complex layers of regulation, Department...Sequencing the transcriptome reveals complex layers of regulation, Department...
Sequencing the transcriptome reveals complex layers of regulation, Department...Copenhagenomics
 
Approaches to cDNA Cloning and Analysis
Approaches to cDNA Cloning and AnalysisApproaches to cDNA Cloning and Analysis
Approaches to cDNA Cloning and AnalysisMatthias Harbers
 
David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011
David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011
David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011sequencing_columbia
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seqJyoti Singh
 
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...fruitbreedomics
 
Annotating nc-RNAs with Rfam
Annotating nc-RNAs with RfamAnnotating nc-RNAs with Rfam
Annotating nc-RNAs with RfamLuca Cozzuto
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Yaoyu Wang
 
Transcription
TranscriptionTranscription
Transcriptionjoyjulie
 
Central dogma
Central dogmaCentral dogma
Central dogmaneizylah
 
Anne_Vaittinen_advanced_seminar_presentation
Anne_Vaittinen_advanced_seminar_presentationAnne_Vaittinen_advanced_seminar_presentation
Anne_Vaittinen_advanced_seminar_presentationAnne Vaittinen
 
Bacterial rna sequencing
Bacterial rna sequencingBacterial rna sequencing
Bacterial rna sequencingDynah Perry
 
Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014LutzFr
 

Similar to Tag-based transcript sequencing: Comparison of SAGE and CAGE (20)

Genomics lecture 3
Genomics lecture 3Genomics lecture 3
Genomics lecture 3
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGS
 
Sequencing the transcriptome reveals complex layers of regulation, Department...
Sequencing the transcriptome reveals complex layers of regulation, Department...Sequencing the transcriptome reveals complex layers of regulation, Department...
Sequencing the transcriptome reveals complex layers of regulation, Department...
 
Approaches to cDNA Cloning and Analysis
Approaches to cDNA Cloning and AnalysisApproaches to cDNA Cloning and Analysis
Approaches to cDNA Cloning and Analysis
 
Transcriptome project
Transcriptome projectTranscriptome project
Transcriptome project
 
Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
 
David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011
David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011
David-Emlyn Parfitt, Columbia Illumina seminar 11/9/2011
 
20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
 
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
 
Annotating nc-RNAs with Rfam
Annotating nc-RNAs with RfamAnnotating nc-RNAs with Rfam
Annotating nc-RNAs with Rfam
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1
 
Transcription
TranscriptionTranscription
Transcription
 
31931 31941
31931 3194131931 31941
31931 31941
 
Central dogma
Central dogmaCentral dogma
Central dogma
 
Anne_Vaittinen_advanced_seminar_presentation
Anne_Vaittinen_advanced_seminar_presentationAnne_Vaittinen_advanced_seminar_presentation
Anne_Vaittinen_advanced_seminar_presentation
 
DNA library
DNA libraryDNA library
DNA library
 
Bacterial rna sequencing
Bacterial rna sequencingBacterial rna sequencing
Bacterial rna sequencing
 
Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014
 

Recently uploaded

Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore EscortsCall Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escortsvidya singh
 
Call Girls Aurangabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Aurangabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort ServicePremium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Servicevidya singh
 
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Bangalore Call Girls Hebbal Kempapura Number 7001035870 Meetin With Bangalor...
Bangalore Call Girls Hebbal Kempapura Number 7001035870  Meetin With Bangalor...Bangalore Call Girls Hebbal Kempapura Number 7001035870  Meetin With Bangalor...
Bangalore Call Girls Hebbal Kempapura Number 7001035870 Meetin With Bangalor...narwatsonia7
 
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls DelhiRussian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls DelhiAlinaDevecerski
 
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...Taniya Sharma
 
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...Call Girls in Nagpur High Profile
 
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Kochi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...hotbabesbook
 
Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Cuttack Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...
VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...
VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...Neha Kaur
 
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls AvailableVip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls AvailableNehru place Escorts
 
Lucknow Call girls - 8800925952 - 24x7 service with hotel room
Lucknow Call girls - 8800925952 - 24x7 service with hotel roomLucknow Call girls - 8800925952 - 24x7 service with hotel room
Lucknow Call girls - 8800925952 - 24x7 service with hotel roomdiscovermytutordmt
 
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...Garima Khatri
 
Chandrapur Call girls 8617370543 Provides all area service COD available
Chandrapur Call girls 8617370543 Provides all area service COD availableChandrapur Call girls 8617370543 Provides all area service COD available
Chandrapur Call girls 8617370543 Provides all area service COD availableDipal Arora
 
(Rocky) Jaipur Call Girl - 9521753030 Escorts Service 50% Off with Cash ON De...
(Rocky) Jaipur Call Girl - 9521753030 Escorts Service 50% Off with Cash ON De...(Rocky) Jaipur Call Girl - 9521753030 Escorts Service 50% Off with Cash ON De...
(Rocky) Jaipur Call Girl - 9521753030 Escorts Service 50% Off with Cash ON De...indiancallgirl4rent
 

Recently uploaded (20)

Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore EscortsCall Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
 
Russian Call Girls in Delhi Tanvi ➡️ 9711199012 💋📞 Independent Escort Service...
Russian Call Girls in Delhi Tanvi ➡️ 9711199012 💋📞 Independent Escort Service...Russian Call Girls in Delhi Tanvi ➡️ 9711199012 💋📞 Independent Escort Service...
Russian Call Girls in Delhi Tanvi ➡️ 9711199012 💋📞 Independent Escort Service...
 
Call Girls Aurangabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Aurangabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available
 
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort ServicePremium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
 
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
 
Bangalore Call Girls Hebbal Kempapura Number 7001035870 Meetin With Bangalor...
Bangalore Call Girls Hebbal Kempapura Number 7001035870  Meetin With Bangalor...Bangalore Call Girls Hebbal Kempapura Number 7001035870  Meetin With Bangalor...
Bangalore Call Girls Hebbal Kempapura Number 7001035870 Meetin With Bangalor...
 
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls DelhiRussian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
 
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
 
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
 
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCREscort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
 
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Kochi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service Available
 
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
 
Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Cuttack Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service Available
 
VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...
VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...
VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...
 
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls AvailableVip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
 
Lucknow Call girls - 8800925952 - 24x7 service with hotel room
Lucknow Call girls - 8800925952 - 24x7 service with hotel roomLucknow Call girls - 8800925952 - 24x7 service with hotel room
Lucknow Call girls - 8800925952 - 24x7 service with hotel room
 
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
 
Chandrapur Call girls 8617370543 Provides all area service COD available
Chandrapur Call girls 8617370543 Provides all area service COD availableChandrapur Call girls 8617370543 Provides all area service COD available
Chandrapur Call girls 8617370543 Provides all area service COD available
 
(Rocky) Jaipur Call Girl - 9521753030 Escorts Service 50% Off with Cash ON De...
(Rocky) Jaipur Call Girl - 9521753030 Escorts Service 50% Off with Cash ON De...(Rocky) Jaipur Call Girl - 9521753030 Escorts Service 50% Off with Cash ON De...
(Rocky) Jaipur Call Girl - 9521753030 Escorts Service 50% Off with Cash ON De...
 

Tag-based transcript sequencing: Comparison of SAGE and CAGE

  • 1. Tag-based transcript sequencing: Comparison of SAGE and CAGE Matthias Harbers European Meeting on Next Generation Sequencing August 29 to September 1, 2010 Leiden University Medical Center, The Netherlands Matthias Harbers 1
  • 2. Focusing on transcriptome analysis: Transcript Start Site Nucleus Promoter “Gene” Genomic DNA (storage of information) Transcription Factors Transcription by RNA polymerase II AAAAA Processed mRNA (7-methylguanosine cap) Cap (transport of information) Translation at ribosome Non-coding RNAs Protein (mostly regulatory functions ?) (tools to operate “functions”) Cytoplasm Transcript information is the basis to understanding genomes, proteins, and non-coding RNAs! Matthias Harbers 2
  • 3. cDNA cloning and sequencing: Genome 1a 1b 2 3 4 5 AAAAAA mRNA AAAAAA G Databases AAAAAA Clone resources Databases mRNA pool Clone resources AAAAAA Great asset foreach gene) (Representative clone for research community! AAAAAA Functional analysis of genes. cDNA Library Preparation AAAAAA AAAAAA AAAAAA High throughput Sequencing AAAAAA cDNA library Random clone picking Matthias Harbers 3
  • 4. What did we learn from cDNA cloning projects?  End-sequencing of cDNA clones 1st approach to transcript discovery  Great improvements in full-length cDNA cloning  Building of large cDNA collections (FANTOM, MGC, others…)  Limited by throughput of capillary sequencing (RIKEN FANTOM Pipeline: ~40,000 reads per day)  Limited by high cost of capillary sequencing (Reagent cost only per read in the US$ 1 to 1.5 range)  Hence, cDNA cloning and sequencing did cover entire complexity of transcriptomes  Other methods needed to uncover complexity of transcriptome Matthias Harbers 4
  • 5. Tag-based methods for high-throughput sequencing  Short sequences (“tags”) are sufficient for transcript identification  Short sequencing reads reduce cost  Short sequencing reads increase throughput  Protocol should provide 1 tag per transcript  Digital expression profiling by counting “tags”  Unbiased transcript discovery  Transcript annotation using reference data  Serial Analysis Gene Expression (SAGE) (Velculescu VE, Zhang L, Vogelstein B, Kinzler KW. (1995). "Serial analysis of gene expression". Science 270 (5235): 484–7.) Matthias Harbers 5
  • 6. Preparation of SAGE libraries AAAAAA Full-length mRNA  Non-polyadenylated mRNA No! mRNA pool AAAAAA Truncated mRNA  AAAAAA Full-length mRNA  cDNA synthesis Attach cDNA to surface AAAAAA AAAAAA TTTTTT TTTTTT Cut with anchoring enzyme (frequent cutters, commonly NlaIII) AAAAAA AAAAAA GTAC TTTTTT GTAC TTTTTT Linker ligation to open Nla III site CATG AAAAAA CATG AAAAAA A GTAC TTTTTT B GTAC TTTTTT Cut with tagging enzyme CATG CATG (LongSAGE: MmeI, SuperSAGE: EcoP15I) A GTAC B GTAC Release from surface B Ligation to form “Ditag” CATG GTAC A GTAC CATG PCR amplification Cut with anchoring enzyme CATG GTAC Concatenation/Cloning Digital Gene Expression (DGE) … CATG GTAC CATG GTAC GTAC CATG … L CATG GTAC R Sequencing concatemers by capillary sequencing New protocols for direct sequencing on high-speed sequencers Matthias Harbers 6
  • 7. Why Cap Analysis Gene Expression (CAGE)?  Sequencing of 5’ends allows discovery of Transcription Start Sites  5’-end sequencing allows transcript identification  5’-end sequencing allows promoter identification  5’-end sequencing allows monitoring of non-polyadenylated mRNAs  Cap-Trapper method very effective for 5’-end selection  Cap-Trapper allows library preparation directly from total RNA  Shift from “3’-end information” to “5’-end information”  Cap Analysis Gene Expression (CAGE) (Shiraki T et al. Proc Natl Acad Sci U S A. 2003 Dec 23;100(26):15776-81. Epub 2003 Dec 8) Matthias Harbers 7
  • 8. Flow of CAGE projects using high-speed sequencing Gene Network Models (genome-wide view on promoter activities) CAGE Library Preparation Barcode CAGE Tag ©Illumina Data processing Illumina GAIIx Sequencer (36 bp reads) Data visualization 27bp tag Broad Narrow Use of barcodes CAGE Clusters (genome annotation) Illumina mapped to genome Sequences Streptavidin TF TF TF TF TF AAAAAA Promoter 1a 1b 2 3 4 5 Genome Biotin RNase treatment EcoP15I site AAAAAA AAAAAA Cap structure mRNA at 5’ end of mRNA Biotinylation of Cap AAAAAA AAAAAA Random priming AAAAAA Full-length mRNA  AAAAAA Non-polyadenylated mRNA  AAAAAA Truncated mRNA No! AAAAAA Full-length mRNA  Starting from total RNA mRNA pool Matthias Harbers 8
  • 9. Comparison of SAGE and CAGE data  Directly compare SAGE (DGE) and CAGE from same samples  Use of proliferating and differentiated C2C12 myoblasts as a model Proliferating C2C12 cells Differentiated C2C12 cells: Fusion into myotubes (picture provided by Willem Hoogaars) (picture provided by Willem Hoogaars)  Use of biological triplicates  Use of Illumina Genome Analyzer for high-speed sequencing  Jointly Leiden University, Genomatix, ServiceXS, DNAFORM (Hestand, MS et al., Nucleic Acids Res. 2010 Sep;38(16):e165]) Matthias Harbers 9
  • 10. Flow SAGE and CAGE data analysis CAGE Prolif1-3 CAGE Prolif1-3 CAGE Diff1-3 CAGE Diff1-3 SAGE Prolif1-3 SAGE Prolif1-3 SAGE Diff1-3 SAGE Diff1-3 CAGE Prolif1-3 CAGE Diff1-3 SAGE Prolif1-3 SAGE Diff1-3 Illumina sequencing (1 channel/sample, 2 technical replica) and data processing CAGE: Remove 1 base SAGE: Add CATG Mapping 2 mismatches allowed Mapping 1 mismatch allowed CAGE: 742,355 regions SAGE: 361,655 regions Set threshold to > 2 TPM Set threshold to > 2 TPM CAGE: 41,862 regions SAGE: 43,512 regions ElDorado mouse genome: 9,957 annotated exons ElDorado mouse genome: 27,190 partially annotated exons ElDorado mouse genome: 2,368 annotated introns ElDorado mouse genome: 2,347 intergenic regions Annotated TSS: 13,541 (32%) Annotated promoter regions: 6,331 (15%) Annotated 3’-end of transcripts: 8,028 (19%) FANTOM 3 CAGE data set: 31,680 (76%) Assigning CAGE regions to genes (1,000 bp window) Assigning SAGE regions to genes (1,000 bp window) CAGE: 10,409 genes SAGE: 10,987 genes Matthias Harbers 10
  • 11. SAGE and CAGE data sets Sample No. seq reads No. aligned % aligned CAGE: Prolif-1 4,886,341 2,086,233 42.7 CAGE: Prolif-1 (re-sequenced) 3,933,233 1,770,247 45.0 CAGE: Prolif-2 5,003,964 2,421,443 48.4 CAGE: Prolif-3 4,734,605 2,062,081 43.6 CAGE: Diff-1 4,525,321 1,679,081 37.1 CAGE: Diff-1 (re-sequenced) 3,101,153 1,252,451 40.4 CAGE: Diff-2 5,060,041 2,195,263 43.4 CAGE: Diff-3 4,830,194 1,578,087 32.7 SAGE: Prolif-1 5,941,753 3,351,426 56.4 SAGE: Prolif-2 7,768,787 4,464,057 57.5 SAGE: Prolif-3 6,723,476 3,878,953 57.7 SAGE: Diff-1 9,467,926 5,811,947 61.4 SAGE: Diff-2 7,269,002 4,618,715 63.5 SAGE: Diff-3 4,392,416 2,494,618 56.8 CAGE average 4.5 mil reads 1.9 mil reads 42% aligned SAGE average 6.9 mil reads 4.1 mil reads 59% aligned Matthias Harbers 11
  • 12. MyoD (myogenic maker): Viewed in UCSC Genome Browser Transcriptional “Exon Painting” activity at 3’-end Narrow CAGE peak (MyoD promoter has TATA box) SAGE peak Matthias Harbers 12
  • 13. Reproducibility of SAGE and CAGE data P=0.981 P=0.963 P=0.771 CAGE Sequencing replica Biological replica Differential expression (CAGE only) P=0.930 P=0.839 SAGE Matthias Harbers 13
  • 14. Correlation of SAGE and CAGE data CAGE: 10,409 genes SAGE: 10,9879 genes Overlap all detectable genes Overlap differentially expressed genes 1169 9240 1747 2160 2144 1702 CAGE SAGE CAGE SAGE Matthias Harbers 14
  • 15. Top 30 genes from SAGE and CAGE expression data CAGE gene Ration Microarray SAGE gene Ratio Microarray Hfe2 4,073 NA RP23-36P22.5 576 NA Myom3 1,624 NA Neb 525 NA Lmod2 1,305 NA Mylpf 504 Yes Myh7 1,124 Yes Ttn 380 NA Mb 908 Yes Myh3 368 Yes RP23-36P22.5 735 NA Xirp1 306 Yes Pygm 717 Yes 1110002H13Rik 263 NA Myl4 614 Yes Tnnc1 232 Yes Synpo21 595 NA Cav3 150 Yes Myh1 561 Yes Cbfa2t3 133 Yes …… …… 13 out of 30 not found by microarray 10 out of 30 not found by microarray Microarray data from same cell line published by: Tomczak KK et al. FASEB J. 2004 Feb;18(2):403-5. Epub 2003 Dec 19. (Affymetrix mouse MG_U74Av2 and MG_U74Cv2 oligonucleotide-based GeneChips) Matthias Harbers 15
  • 16. GO terms found in SAGE, CAGE and microarray data CAGE GO SAGE GO Microarray GO Regulation of striated muscle Regulation of muscle contraction Cycline-dependent protein kinase contraction inhibitor activity Cardiac muscle contraction Cardiac muscle contraction Myogenesis Myogenesis Myogenesis Skeletal muscle development Regulation of muscle contraction Regulation of striated muscle Myoblast differentiation contraction Skeletal muscle development Skeletal muscle development 6-phosphofructokinase activity Muscle development Myofibril assembly Muscle development Striated muscle contraction Muscle development Muscle cell differentiation Myoblast differentiation Myoblast fusion Tumor suppressor activity Muscle cell differentiation Striated muscle contraction Myofibril assembly Sarcomere organization Muscle cell differentiation Heart development 10/10 muscle related 10/10 muscle related 7/10 muscle related Matthias Harbers 16
  • 17. Myl1 CAGE region as example for new TSS discovery CAGE Diff1 CAGE Prolif1 Mouse Human Horse Myosin light chain 1 (Myl1) promoter Matthias Harbers 17
  • 18. Differentially regulated TSS found in CAGE regions  Found 196 new differentially regulated TSS in CAGE data  Out of which 111 regions are upstream of known genes  Out of which 85 regions are downstream of known genes (Lower Cp value = higher expression) + - + + + + + +  7 out of 8 regions tested could be confirmed by RT-PCR Matthias Harbers 18
  • 19. Discovery of novel TSS by CAGE Matthias Harbers 19
  • 20. “Exon-painting” found in CAGE libraries  CAGE tags with some frequency found in exons  Exon-painting is reproducible and gene specific  New re-capping activity suggested Col1a1 Col1a2 Matthias Harbers 20
  • 21. New developments for CAGE method  nanoCAGE: Preparation of CAGE libraries starting from as little as 50 ng total RNA (Plessy C et al. Nat Methods. 2010 Jul;7(7):528-34. Epub 2010 Jun 13.)  CAGE-Scan: Use of paired-end sequencing to link new TSS to known genes (Plessy C et al. Nat Methods. 2010 Jul;7(7):528-34. Epub 2010 Jun 13.) AAAAAA AAAAAA AAAAAA AAAAAA 1b 2 3 4 5 AAAAAA AAAAAA  Helicos-CAGE: Use of single-molecule sequencing to reduce bias in CAGE libraries and reduced sample requirements  Link high-speed sequencing to cDNA cloning: Creating the resources needed to study newly discovered transcripts! Matthias Harbers 21
  • 22. Summary  SAGE and CAGE both provide highly reproducible data sets  SAGE and CAGE data show great overlap on genes covered  Both methods show better coverage than microarray data  CAGE data more complex than SAGE: 67% of gene have multiple CAGE regions  CAGE data allowed for discovery of new TSS  CAGE data indicate transcriptional at 3’-ends of annotated genes  CAGE data showed some exon-painting for many transcripts Matthias Harbers 22
  • 23. Acknowledgements Leiden University Medical Center: Genomatix: Matthew S. Hestand Andreas Klinghoff Yavuz Ariyurek Matthias Scherf Yolande Ramos Thomas Werner Gert-Jan B. van Ommen Johan T. den Dunnen Peter A.C. ‘t Hoen DNAFORM: Service XS: Makoto Suzuki Wilbert van Workum DNAFORM Omics Science Center RIKEN Yokohama: Piero Carninci Charles Plessy Yoshihida Hayashizaki Matthias Harbers 23
  • 24. Thank you for your interest! Matthias Harbers 24