Approaches to cDNA Cloning and Analysis


Published on

The analysis of all transcripts within a cell is of essential importance. Molecular biology provides many approaches to clone RNA transcripts into cDNA. Large cDNA collections are in the public domain to serve the research community. Today, however, new high-speed sequencing methods allow a much deeper view into transcriptomes than possible by classical cloning.

Published in: Health & Medicine
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Approaches to cDNA Cloning and Analysis

  1. 1. Approaches to cDNA Cloning and AnalysisDr. Matthias HarbersChief Scientist DNAFORM Inc.Co-assigned Scientist at the RIKEN Omics Center© Matthias Harbers 2008 1
  2. 2. Classical View on the Utilization of Genomic Information Transcript Start Site Nucleus Promoter “Gene” Genomic DNA (storage of information)Transcription Factors Transcription by RNA polymerase II AAAAA Coding mRNA Cap (transport of information) (7-methylguanosine cap or m7G cap) Translation at ribosome Protein Cytoplasm (tools to operate “functions”) Developed in the 50th and 60th of last century. 2
  3. 3. The Classical View Has Been Challenged by new Developments Discovery/Project Importance YearDiscovery of reverse DNA can be synthesized from RNA 1969transcriptases templatesDiscovery of ligase and Establishing DNA recombination, 1960s and 70srestriction DNA cloning, and preparation ofendonucleases DNA librariesDNA sequencing Chain-termination method 1975 (“Sanger Sequencing”)Human Genome Project Move to sequencing entire genomes 1990 to 2003Expressed sequence tags First attempt to gene discovery 1991(ESTs) and expression profilingIMAGE Project Program to create cDNA collections 1993 to 2007 from key organismsENCODE Project Functional elements in human Since 2003 genome 3
  4. 4. Topics of the PresentationApproaches to cDNA cloningSpecial topics related to cDNA cloningLarge-scale cDNA cloning projectsSmall RNA (sRNA) cloningTag-based approachesNext-Generation SequencingWhere do we go from here? 4
  5. 5. Approaches to cDNA cloning AAAAA 3’ Capped and polyadenylated mRNA5’ Cap Cap mRNA A A A A A… 1st Strand cDNA synthesis: TTTTT Commonly oligo(dT) priming mRNA Prime 2nd strand cDNA synthesis: Adaptor cDNA 5’-Linker ligation or tailing reaction 2nd Strand synthesis Adaptor cDNA (Option to make PCR) Digestion with cloning enzyme(s): cDNA Methylation can protect against internal cleavage within cDNA Ligation into phage or plasmid vector: PlPasmi Plasmid d (Plasmid with cDNA insert may be excised from phage vector) Phage 5
  6. 6. Special Topics Related to cDNA Cloning Synthesis of very long cDNAs (>10.000 bp, not further discussed) Full-length cDNA cloning (important to obtain functional cDNAs) Normalization (key to gene discovery in large-scale projects) Cloning vectors and applications (not further discussed) Subtractive cloning (not further discussed) Expression cloning (not further discussed) Addressing splicing (left out of large-scale projects)Ref.: Harbers M: The current status of cDNA cloning, Genomics. 2008 Mar;91(3):232-42. 6
  7. 7. Use of cDNA LibrariesIsolation of individual target genes in Research Laboratories Transcriptome Analysis and Genome Projects Large-scale random clone picking End-sequencing to build transcript catalogs Full-length sequencing of selected clones Creation of sequence data bases Creation of cDNA collections Ref.: Carninci P et al.: Targeting a complex transcriptome: the construction of the mouse full-length cDNA encyclopedia. Genome Res. 2003 Jun;13(6B):1273-89. 7
  8. 8. Benefits of Large-Scale cDNA Cloning Projects Improved cDNA Cloning Technology SNP Analysis: Proteomics: Sequence Data Location in Promoter orFunctional Studies on Exon Proteins Clone Collections Functional Studies Gene Regulation: Genomics: Promoter Identification Gene Discovery Expression Profiling Mapping RNAi Noncoding RNA Knock down Sense-antisense Pairs Public sequence databases and clone collections are essential tools for research! 8
  9. 9. The mRNA Pool of a Cell 10,000 t0 20,000 transcripts <20% of mRNA 5 t0 10 transcripts up to 20% of mRNA 500 t0 2,000 transcripts 40 to 60 % of mRNA(Old numbers estimated fromreassociation and hybridization studies) Discovery of rarely expressed genes is a difficult task! 9
  10. 10. Normalization of cDNA LibrariesDuring a Normalization Step a cDNA pool is hybridized against an aliquot of theoriginal mRNA sample or the same cDNA pool. Due to concentration dependenthybridization kinetics the number clones representing highly expressed genes willbe reduced yielding in a more equal distribution of different cDNAs in the library. Without Normalization With Normalization Combine Normalization and /Subtraction /Subtraction Subtraction for higher Gene  /Hind III  /Hind III Discovery 9.4 kbp 9.4 kbp 6.6 kbp 6.6 kbp Number of non-redundand clones 4.4 kbp 4.4 kbp 2.2 kbp 2.2 kbp Driver 2 2.0 kbp 2.0 kbp Lib. 4 + Driver 2 Driver 1 Lib. 3 + Driver 1 Lib. 2 No Driver 0.5 kbp 0.5 kbp Lib. 1 : Highly expressed genes Example: Pancreas cDNA Number of Libraries 10
  11. 11. Full-Length cDNA Cloning “Cap Trapper” Method “Oligo Capping” Method Cap P P P mRNA A A A A A… Cap mRNA A A A A A… P mRNA A A A A A… TTTTT Phosphatase Chemical reaction Cap P P P mRNA A A A A A… Biotin Cap mRNA mRNA A A A A A… A A A A A… cDNA TTTTT Pyrophosphatase RNase I digestion P mRNA A A A A A… mRNA A A A A A… Biotin Cap mRNA A A A A A… cDNA TTTTT RNA Ligase Adaptor mRNA A A A A A… Recovery on beads TTTTT Biotin Cap mRNABeads A A A A A… cDNA TTTTT Adaptor mRNA A A A A A… cDNA TTTTT Adaptor Primer cDNA cDNA Key Steps: Key Steps: Biotinylation of Cap structure and RNase I Treatment Replacement of Cap structure by RNA oligonucleotide 11
  12. 12. Examples for Large-Scale cDNA Cloning Projects Targeting at the cloning and full-length sequencing of “one representative” cDNA clone for each gene. This reduces cost, but it entirely ignores splicing events. Project Organisms URLIMAGE Consortium Human, mouse, rat, zebrafish, fugu, Xenopus (X. laevis and X. tropicalis), cow, and primateMammalian Gene Human, mouse, rat, cow, others (MGC)Tokyo University Human FANTOM Mouse full-length cDNA Rice Arabidopsis Arabidopsis news/071015.shtmlORF Consortium Human (some mouse clones) 12
  13. 13. Pre-mRNA is Spliced into mRNALarge-scale cloning projects do not cover splice variants.But maybe 75% of all signal transducers are regulated by splicing! 13
  14. 14. Capturing alternatively Spliced Exons in mRNA Sense strand Antisense strand Sample 1 Sample 2 Cut double-stranded regions Capture single-stranded regionsRef.: Watahiki A et al.: Libraries enriched for alternatively spliced exons reveal splicing patterns in melanocytes and melanomas.Nature Methods 2004 Dec 1(3): 233-9. 14
  15. 15. The Discovery of small RNAsClassical cloning protocols removed all cDNA fragments of less than500 bp (avoid linker contamination, cutoff of cloning vectors).Proteins of less than 100 amino acids were commonly not annotated.However, small RNAs have important functions!Small RNAs are non-coding RNAs (ncRNAs) often derived from maturationprocesses in the cell that include digestion steps by RNases.Most prominent example: microRNAs (miRNA) have reverse complementsequences to other mRNA transcripts. They are around 21-23 base pairs longafter maturation and can alter the expression/translation of one or severaltarget genes through RNA interference.And we are still finding many more new RNA species!Ref.: Kawaji H, Hayashizaki Y. Exploration of small RNAs. PLoS Genet. 2008 Jan;4(1):e22. 15
  16. 16. Small RNA (sRNA) Cloning 5’ P OH 3’ Short RNA Modify 3’ end: P CCCCCCCCC P C-Tailing or adaptor ligation Modify 5’ end: CCCCCCCCC Here by adaptor ligation CCCCCCCCC GGGGGGGG 1st Strand cDNA synthesis CCCCCCCCC GGGGGGGG 2nd Strand synthesis and PCR Sequence analysis: PlPasmi Direct sequencing of DNA fragments Plasmid d (Option to ligate into plasmid vector)Key Steps:Modification of 5’ and 3’ end of RNA for PCR amplification. Selection by size range. Commonly only sequenced.No cloning needed as short cDNAs can be chemically synthesized. 16
  17. 17. Tag-Based ApproachesGene discovery cannot be done by standard methods used inexpression profiling such as microarray or PCR.Unsupervised approaches are needed for gene discovery that donot require sequence information for probe design.First approach to gene discovery was sequencing of 3’ ends of cDNAclones (EST sequencing). Requires one read per clone.Gene identification does not require sequences of 500 to 800 bp,but much shorter sequences of some 20 bp or less are sufficient.Use long sequencing reads to cover many short fragments by one run.New protocols to isolated short fragments from RNA.Tag-based approaches in expression profiling and gene discovery.Ref.: Harbers M and Carninci P: Tag-based approaches for transcriptome research and genome annotation.Nature Methods 2005 Jul 2(7): 495-502. 17
  18. 18. Tag-Based Approaches Paired-end Tags or PETs 5’ end 3’ end Anchoring enzyme sitesCap selection Remove poly(A) Cap mRNA AAAAA CAGE SAGE SAGE 3’ SAGE 5’ SAGE (5’ related) (3’ related) MPSS DGE RNA-Seq or other shotgun approaches 18
  19. 19. Serial Analysis Gene Expression (SAGE) (Digital Gene Expression (DGE)) mRNA A A A A A… 1st Strand cDNA Synthesis with biotinylated primer TTTTTT Biotin (Commonly starting from mRNA.) cDNA Biotin Beads Preparation of double-stranded cDNA and digestion with anchoring enzyme Adaptor cDNA Biotin Beads Adaptor Ligation and digestion with Mme I (20 bp) or EcoP15I (27 bp)Adaptor Adaptor Formation of “Di-Tags” (Di-Tags can be used for direct sequencing (DGE).) Concatenation and cloning into plasmid vector (Classic sequencing of concatemers.) Very well established and rich reference/annotation information. Digital expression profiling by “tag counting”. Ref.: Velculescu VE et al. Serial analysis of gene expression. Science. 1995 Oct 20;270(5235):368-9, 371. 19
  20. 20. Cap Analysis Gene Expression (CAGE) 5’ CAP mRNA AAAAA 3’ Commonly starting from 50g total RNA. 1st Strand cDNA Synthesis (Covering poly(A-) mRNA and long mRNA.) CAP mRNA AAAAA cDNA NNNNNN 5’-End Selection on Beads by Cap Trapper (Less bias due to chemical modification of Cap.) Beads CAP mRNA AAAAA cDNA NNNNNN Adaptor Ligation and 2nd Strand Synthesis Adaptor I cDNA NNNNNN Digestion with Mme I (20 bp) or EcoP15I (27 bp) Adaptor I cDNA Isolation of CAGE TAGs Adaptor I TAG 3’-End Adaptor Ligation Adaptor I TAG Adaptor II Preferably used for direct sequencing (>4,000,000 tags per run).Ref.: Kodzius R et al.: Cap analysis of gene expression: transcription start site mapping and expression profiling.Nature Methods 2006 Mar 3(3): 211-222. 20
  21. 21. Cap Analysis Gene Expression (CAGE) Signal 1 Signal 2 Signal 3 CAP mRNA A A A A A TSSGenome TF1 TF2 TF3 Exon 1 2 3 4 5 Tiling Array/RNA-Seq Array/RNA- Microarray TF CAGE Tags SAGE ChIP RACE CAGE tags experimentally link transcripts to their promoters. CAGE tags integrate information based on genome annotations. CAGE tags can be linked to whole genome tiling arrays and RNA-Seq data. CAGE tags can be linked to Chromatin IP/ChIP-Seq data. CAGE tags correlate with open chromatin. CAGE tags provide primer information for cloning new transcripts. 21
  22. 22. Classical DNA Sequencing by Chain-Termination Method dNTP/ddNTP Mix G C G A T G T C C A A A G C T Primer T A A C C A DNA Template T G G T T G C T G C C A A T G T One reaction per nucleotide DNA Polymerase A T G C T G G T T G C T G C C A A T G T T G G T T G C T G C C A T G G T T G T G G T T G C T G C Capillary Sequencer Analyze fragments DNA fragments from by gel electrophoresis Primer extension reactions Over 30 years the most important method in molecular biology.Challenged by emerging new sequencing technologies: Next-Generation Sequencing. 22
  23. 23. Next-Generation SequencingDriven by the “$1000 genome” different companies are on the move to provide new sequencingtechnologies based on “sequencing by synthesis” or “ligation-based sequencing”. Other approachesmay use hybridization methods or physical means in the future. Platform Mb per run/read length MethodRoche 454 Sequencing 100 Mb/250 bp/7h per run Emulsion PCR and PyrosequencingIllumina (Solexa) 1300 Mb/32-40bp/4 days per run Bridge PCR and sequencing-by- synthesisABI SOLiD 3000 Mb/35 bp/5 days per run Emulsion PCR and ligation-based sequencingHelicos 25 to 90 Mb per h/up to 55 bp Single-molecule detectionRef.: Mardis ER. The impact of next-generation sequencing technology on genetics.Trends Genet. 2008 Mar;24(3):133-41. Epub 2008 Feb 11.von Bubnoff A. Next-generation sequencing: the race is on. Cell. 2008 Mar 7;132(5):721-3. 23
  24. 24. Example for Ligation-Based Sequencing: ABI SOLID SystemDNA fragments having Project specific data analysis: adaptor sequences: Mapping to genome Genomic DNA Reference information Tag SequencingImages are the courtesy of ABI and were kindly provided by ABI Japan. 24
  25. 25. Example for Ligation-Based Sequencing: ABI SOLID SystemImages are the courtesy of ABI and were kindly provided by ABI Japan. 25
  26. 26. Example for Ligation-Based Sequencing: ABI SOLID SystemImages are the courtesy of ABI and were kindly provided by ABI Japan. 26
  27. 27. Example for Sequencing-by-Synthesis: Illumina 1G SystemDNA per run Addition of Add to flow Preparation 0.1 ~1µg 2 adaptors cell of clusters Images are the courtesy of Illumina and were kindly provided by Illumina Japan. 27
  28. 28. Example for Sequencing-by-Synthesis: Illumina 1G System 3’ 5’ Cycle 1 A  Addition of the sequence reagent T C G One base extension reaction C Removal of non-incorporated bases G C G Detect fluorescence signal T AA C T Removal of the fluorescence label G C Cycle 2 T C C C Repetition of the above reactions C A G T A Cycle 3, 4, 5….. T C AG C Repetition of the above reaction A G T A G T T G T 5’ Images are the courtesy of Illumina and were kindly provided by Illumina Japan. 28
  29. 29. Example for Sequencing-by-Synthesis: Illumina 1G System 40,000,000 clusters on a flow cell 20um 100umImages are the courtesy of Illumina and were kindly provided by Illumina Japan. 29
  30. 30. Where do we go from here?Next-Generation Sequencing will push genome sequencing field forre-sequencing and de novo sequencing (“1000 Genome Project”).Metagenomics (Environmental Genomics, Ecogenomics, orCommunity Genomics): Direct analysis of genetic materials obtainedfrom environmental samples.Expression profiling: SAGE (DGE), CAGE, PET, RNA-Seq.Analytical applications to identify functional regions/elements ingenomes: ChIP-Seq, open chromatin, SNPs, splicing, others to come .Analytical applications in mutation screens.Analytical applications for detection of infectious agents. 30
  31. 31. Transcriptome Analysis: The Dominance of noncoding RNAGenome sequencing and annotation did not tell us about the realextent of gene expression!Tiling array experiments and deep sequencing by next-generationsequencing methods indicates that >90% of the genome is expressed.Maybe 40 to 50% of the mRNA is not polyadenylated, and we did notanalyze it yet.Most of the transcripts are potentially noncoding RNAs havingunknown (regulatory ?) functions.The definition of a “gene” may no longer hold with many differenttranscripts derived from same loci.We do not understand the “hidden layers” regulating the utilization ofgenomic information.Ref.: Mattick, J.S. "Challenging the dogma: The hidden layer of non-protein-coding RNAs on complex organisms"Bioessays. (2003) 25, 930-939. 31
  32. 32. Example for RNA-Seq in Yeast Saccharomyces pombe (fission yeast)Illumina 1G sequencer; average read length 39.1 base, fragments from poly(A) mRNA> 23 mil reads (~60 genome length) proliferating cells.> 99 mil reads (~ 190 genome length) from five different stages.Covering ~94% nuclear and > 99% of mitochondrial genome.Confirmed expression from intergenic regions by RT-PCR.Control experiments using whole genome tiling arrays (25 mer/20 nt intervals)confirmed identification novel transcripts (26 out of 453 may encode shortproteins).Recent publications on the use of RNA-Seq include S. pombe, S. cerevisiae, Arabidopsis, mouse tissues, mouse stem cells, and HeLa S3.Ref.: Wilhelm BT, Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution.Nature. 2008 Jun 26;453(7199):1239-43. Epub 2008 May 18.Graveley BR. Molecular biology: power sequencing. Nature. 2008 Jun 26;453(7199):1197-8. 32
  33. 33. Examples for Genome Size (haploid) Genome Length in bp Estimated gene numberPhi-X 174 5,386 10Human mitochondrion 16,569 37E. coli 4,639,221 4,377Saccharomyces cerevisiae 12,495,682 5,770Caenorhabditis elegans 100,258,171 19,427Arabidopsis thaliana 115,409,949 ~28,000Drosophila melanogaster 122,653,977 13,379Humans 3.3 x 109 ~20,500Amphibians 109–1011 ? Values taken from: out of July 2007 33
  34. 34. Where are our limitations?Mammalian genome size and transcriptome complexity: Enrichment of fragments e.g. using microarrays, Normalization and longer reads required.Thus far uneven representation requires use of more than one method.Requirements for starting materials (target is to analyze single cells).No unified cDNA library method: using different methods depending on RNA length.Very large data files and lack of computational analysis tools.What is transcriptional noise?Research dominated by “detection” rather than “functional analysis”. Ref.: Struhl K. Transcriptional noise and the fidelity of initiation by RNA polymerase II. Nat Struct Mol Biol. 2007 Feb;14(2):103-5. 34
  35. 35. Present Strategies for Transcriptome Analysis Interest has shifted to next-generation sequencing to profile transcriptional activities. We cannot predict ends of transcripts, and therefore tag-based approaches to indentify start sites and termination sites are needed. Identification of transcription start sites in combination with other information is driving “gene networks studies” and “system biology”. RNA-Seq provides new means for the identification of splice sites and expressed mutations. We do not clone all those new transcripts, but there will be a need to get resources for functional analysis of new transcripts. We are more than ever falling short on the functional analysis of new transcripts. Thus far we have not even analyzed all coding transcripts!It is an exciting time to work on transcriptome analysis offering many challenges and rewards! 35
  36. 36. Contact:Dr. Matthias HarbersDNAFORM Inc.Leading Venture Plaza-2, 75-1, Ono-choTsurumi-ku, Yokohama City, Kanagawa, 230-0046JapanE-mail: matthias.harbers@dnaform.jpPhone: +81-(0)45-510-0607FAX: +81-(0) 45-510-0608URL: 36