Rnaseq forgenefinding

Transcript discovery and gene model correction using next generation sequencing data SuchetaTripathy, VBI, 11th Nov 2010

Brief History of Sequencing Sanger Dideoxy Sequencing methods(1977). Maxam Gilberts Chemical degradation methods(1977). Two Labs that owned automated sequencers: 1. Leroy Hood at Caltech, 1986(commercialized by AB) 2. Wilhelm Ansorge at EMBL, 1986(commercialized by Pharmacia-Amersham and GE healthcare)

Brief History Of sequencing Hypoxanthine-guanine phosphoribosyltransferase (HGPRT) Alu sequences

Hitachi Laboratory developed High throughput capillary array sequencer, 1996. 1991, A patent filed by EMBL on media less, solid support based sequencing. Brief History Of sequencing

NextGen Sequencing Methods 454 sequencing methods(2006) Principles of pyrophosphate detection(1985, 1988) Illumina(Solexa) Genome sequencing methods(2007) Applied Biosystems ABI SOLiD System(2007) Helicos single molecule sequencing(Helioscope, 2007) Pacific Biosciences single-molecule real-time(SMRT) technology, 2010 Sequenom for Nanotechnology based sequencing. BioNanomatrixnanofluidiscs. RNAP technology.

Figure 1. (A) Outline of the GS 454 DNA sequencer workflow. Library construction (I) ligates 454-specific adapters to DNA fragments (indicated as A and B) and couples amplification beads with DNA in an emulsion PCR to amplify fragments before sequencing (II). The beads are loaded into the picotiter plate (III). (B) Schematic illustration of the pyrosequencing reaction which occurs on nucleotide incorporation to report sequencing-by-synthesis. (Adapted from http://www.454.com.)

Outline of the Illumina Genome Analyzer workflow. Similar fragmentation and adapter ligation steps take place (I), before applying the library onto the solid surface of a flow cell. Attached DNA fragments form ‘bridge’ molecules which are subsequently amplified via an isothermal amplification process, leading to a cluster of identical fragments that are subsequently denatured for sequencing primer annealing (II). Amplified DNA fragments are subjected to sequencing-by-synthesis using 3′ blocked labelled nucleotides (III). (Adapted from the Genome Analyzer brochure, http://www.solexa.com.)

(A) Primers hybridise to the P1 adapter within the library template. A set of four fluorescence-labelleddi-base probes competes for ligation to the sequencing primer. These probes have partly degenerated DNA sequence (indicated by n and z). Specificity of the di-base probe is achieved by interrogating the first and second base in each ligation reaction (CA in this case for the complementary strand). (B) Sequence determination by the SOLiD DNA sequencing platform is performed in multiple ligation cycles, using different primers, each one shorter from the previous one by a single base. The number of ligation cycles determines the eventual read length, whilst for each sequence tag, six rounds of primer reset occur [from primer (n) to primer (n − 4)]. (Adapted and modified from http://www.appliedbiosystems.com.)

Cost Adapted from Eric Lander, 2010

Throughput Standard ABI “Sanger” sequencing 96 samples/day Read length ~650 bp Total = 450,000 bases of sequence data 454 was the game changer! ~400,000 different templates (reads)/day Read length ~250 bp Total = 100,000,000 bases of sequence data!!!

Throughput 454 Life Sciences/Roche Genome Sequencer FLX: currently produces 400-600 million bases per day per machine Published 1 million bases of Neanderthal DNA in 2006 May 2007 published complete genome of James Watson (3.2 billion bases ~20x coverage) Solexa/Illumina 10 GB per machine/week May 2008 published complete genomes for 3 hapmap subjects (14x coverage) ABI SOLID 20 GB per machine/week

RNASeq Catalogue all species of transcripts. mRNA Non-coding RNA Small RNA Splicing patterns or other post-transcriptional modifications. Quantify the expression levels.

Zhong Wang et al; Nat. Rev. Genetics, 2009

Other Applications SNP detection Splice Variant Discovery Identification of miRNA targets TF binding sites Genome Methylation pattern RNA editing Metagenomic projects Gene Expression Analysis

Difference with other expression sequencing EST: Low throughput, expansive, NOT quantitative. SAGA, CAGE, MPSS: Highthroughput, digital gene expression levels Expansive Sanger sequencing methods A portion of transcript is analyzed Isoforms are indistinguishable

Advantages: Zero or very less background noise. Sensitive to isoform discovery. Both low and highly expressed genes can be quantified. Highly reproducible.

Data Analysis Mapping Reads to the reference assembly Filtering output: Reads mapping > x number of times Downstream data analysis

Mapping One or two mis-matches < 35 bases One insertion/deletion. K-mer based seeding. ,[object Object]

Transcript abundance.,[object Object]

Integrated Pipeline ,[object Object]

ERANGE:Is a full package for RNASeq and chipSeq data analysis

DESEQ(used by edgeR package),[object Object]

An overview of the MapSplice pipeline. © The Author(s) 2010. Published by Oxford University Press. Wang K et al. Nucl. Acids Res. 2010;38:e178-e178

Rnaseq forgenefinding

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Rnaseq forgenefinding

Similar to Rnaseq forgenefinding (20)

More from Sucheta Tripathy

More from Sucheta Tripathy (20)

Recently uploaded

Recently uploaded (20)

Rnaseq forgenefinding

Editor's Notes