Successfully reported this slideshow.

Third Generation Sequencing

32

Share

Loading in …3
×
1 of 43
1 of 43

Third Generation Sequencing

32

Share

Download to read offline

whole genome analysis
history
needs
steps involved
human genome data
NGS
pyrosequencing
illumina
SOLiD
Ion torrent
PacBio
applications
problems
benefits

whole genome analysis
history
needs
steps involved
human genome data
NGS
pyrosequencing
illumina
SOLiD
Ion torrent
PacBio
applications
problems
benefits

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Third Generation Sequencing

  1. 1. WHOLE GENOME ANALYSIS R.Priyanka M.Sc Biotechnology
  2. 2. Genome is the entire complement of genetic material of an organism, virus or an organelle (or) Haploid set of chromosome in eukaryotic organism Whole genome is the complete genome set of an organism. Whole genome sequencing is a laboratory process where complete DNA sequence of organism’s genome at a single time.
  3. 3. The term "sequence analysis" implies subjecting a DNA or peptide sequence to sequence alignment, sequence databases, repeated sequence searches, or other bioinformatics methods on a computer. Sequence analysis in bioinformatics is an automated, computer-based examination of characteristic fragments, e.g. of a DNA strand.
  4. 4. WHOLE GENOME ANALYSIS Genomic analysis is the identification, measurement or comparison of genomic features such as DNA sequence, structural variation, gene expression, or regulatory and functional element annotation at a genomic scale. Methods for genomic analysis typically require high-throughput sequencing or microarray hybridization and bioinformatics.
  5. 5. • Allan Maxam and Walter Gilbert developed chemical method of DNA sequencing in 1976-1977. • Since this method was technically complex & use of extensive hazardous chemicals & fallen out of flavor. • Sanger coulson developed the chain-termination method in 1977. • Only be used for fairly short strands (100 to 1000 base pairs) and longer sequences must be subdivided into smaller fragments.
  6. 6. • Bacteriophage fX174, was the first genome to be sequenced in 1935 with 11 genes & 5,368 base pairs (bp). This is a viral genome smaller than T Phages and are polyhedral. This was done by Norman & Baker by staining method with the use of Sanger method of shotgun sequencing. • Haemophilus influenza was the first bacterial genome to be sequenced in 1995 by Craig. This is a gram negative bacteria with 1.8 million bp • The first nearly complete human genomes sequenced were J. Craig Venter's, James Watson's, Yoruban and Seong-Jin Kim.
  7. 7. WHY WHOLE GENOME SEQUENCING? • Information about coding and non coding part of an organism. • To find out important pathways in microbes. • For evolutionary study and species comparison. • For more effective personalized medicine (why a drug works for person X and not for Y). • Identification of important secondary metabolite pathways (e.g. in plants). • Disease-susceptibility prediction based on gene sequence variation.
  8. 8. 1) Genome sequence assembly 2) Identify repetitive sequences – mask out 3) Gene prediction – train a model for each genome 4) Genome annotation- process of attaching biological information to sequence 5) Metabolic pathways and regulation- to find missing genes 6) Protein 2D gel electrophoresis- to detect translational product 7) Functional genomics 8) Gene location/gene map 9) Self-comparison of proteome 10) Comparative genomics 11) Identify clusters of functionally related genes 12) Evolutionary modeling- to analyze chromosomal arrangement, duplications, predictions can be made
  9. 9. Timeline of Large-Scale Genomic Analysis Lecture 14
  10. 10. • Easy for prokaryotes (single cell) – one gene, one protein • More difficult for eukaryotes (multicell) – one gene, many proteins • Very difficult for Human – short exons separated by non-coding long introns • Gene recognition is by sequence alignment
  11. 11. For eg: 3.1 x 109 bp in human genome Difficulties: • Small genes are hard to identify • Some genes are rarely expressed and do not have normal codon usage patterns – thus hard to detect
  12. 12. Human Genome Data © American Society for Investigative Pathology Time Period Turn-around Time Cost per genome 1990 – 2003 ~ 5 years ~ $3 billion 2003-2009 ~ 6 months $300,000 2010-2014 < 1 month $3,800/exome $20,000/WGS 2015 15 minutes $100
  13. 13. • Recently a number of faster and cheaper sequencing methods have been developed. – In October 2006, the X Prize Foundation, working in collaboration with the J. Craig Venter Science Foundation, established the Archon X Prize for Genomics, intending to award US $10 million to "the first Team that can build a device and use it to sequence 100 human genomes within 10 days or less, with an accuracy of no more than one error in every 1,000,000 bases sequenced, with sequences accurately covering at least 98% of the genome, and at a recurring cost of no more than $1,000 per genome". An error rate of 1 in 1,000,000 bases, out of a total of approximately six billion bases in the human diploid genome, would mean about 6,000 errors per genome. In clinical use, such as predictive medicine currently 1,400 clinical single gene sequencing tests are over In August 2013 this has been cancelled. – Currently there is a developing method that will sequence the entire human genome for $1000, to allow personal genomics. – One of the most widely used new methods involve the pyrosequencing biochemical reactions (invented by Nyren and Ronaghi in 1996), with the massively parallel microfluidics technology invented by the 454 Life Sciences company. This combined technology is called “454 sequencing”.
  14. 14. NEXT GENERATION SEQUENCING Roche’s 454 FLX Ion torrentIllumina ABI’s Solid • Array based sequencing • Sequence full genome of an organism in a few days at a very low cost. • Produce high throughput data in form of short reads.
  15. 15. Towards ‘Next Generation’ sequencing instruments  Capacity greater than one Gigabase per run  Drastic decrease in costs per genome  sequencing of whole bacterial genomes in a single run  sequencing genomes of individuals  metagenomics: sequencing DNA extracted from environmental samples  looking for rare variants in a single amplified region, in tumors or viral infections  transcriptome sequencing: total cellular mRNA converted to cDNA.
  16. 16. PYROSEQUENCING •This technique used to sequence DNA by using chemiluminescent enzymatic reactions. Step 1: Preparation of single stranded DNA molecule by alkali denaturation and dNTP is attached Step 2: In DNA synthesis, a dNTP is attached to the 3’end of the growing DNA strand. DNA Polymerase start elongating by using dNTPs. The two phosphates on the end are released as pyrophosphate (PPi).
  17. 17. Samples collected from library adapters added to both ends Individual fragments are captured using adapters Roche’s 454 FLX
  18. 18. Fragments are amplified by PCR picotiter plate is loaded (sample loading takes 8h) Sequence is accomplished & data is analysed
  19. 19. Step 3: ATP sulfurylase is normally used in sulfur assimilation: it converts ATP and inorganic sulfate to adenosine 5’-phosphosulfate (APS) and PPi. Luciferase is the enzyme that causes fireflies to glow. It uses luciferin and ATP as substrates, converting luciferin to oxyluciferin and releasing visible light.
  20. 20. • The four dNTPs are added one at a time • The amount of light released is proportional to the number of nucleotides added to the new DNA strand. Thus, if the sequence has 2 A’s in a row, both get added and twice as much light is released. Step 4: After the reaction has completed, apyrase is added to destroy any leftover dNTPs. • The pyrosequencing machine cycles between the 4 dNTPs many times, building up the complete sequence. About 300 bp of sequence is possible (as compared to 800-1000 bp with Sanger sequencing). Step 5: The light is detected with a charge-coupled device (CCD) camera- Pyrosequencing method
  21. 21. • Sample preparation: Extracted and purified DNA. • Tagmentation: Transposome is fragmented • 2 different adapters are added on each end of the DNA, then bind it to a slide( flow cell) coated with the complementary sequences for each primer. • This allows “bridge PCR”, producing a small spot of amplified DNA on the slide. • The slide contains millions of individual DNA spots. The spots are visualized during the sequencing run, using the fluorescence of the nucleotide being added. Illumina Sequencing
  22. 22. Illumina Sequencing Chemistry • Cluster generation: process where each fragment is isothermally amplified. • Reverse strands are cleaved and washed away while forward strands are present • This method uses the basic Sanger idea of “sequencing by synthesis” of the second strand of a DNA molecule. Starting with a primer, new bases are added one at a time, with fluorescent tags used to determine which base was added. • The fluorescent tags block the 3’-OH of the new nucleotide, and so the next base can only be added when the tag is removed. • The cycle is repeated 50-100 times.
  23. 23. SOLiD Sequencing 1. Fragment library- 2 types of fragments(single & mate paired) 2. Ligation of adapters 3. Substrate preparation 4. Hybridization – clonal amplification 5. emulsion PCR 6. Di-base probes(fluorescently labelled) are added 7. Fluorescence is measured
  24. 24. Emulsion preparation:  Water + capture beads + enzyme + DNA fragments + synthetic oil is vigorously shaked. Thus water droplets are formed around beads i.e emulsion •Each plate has 1.6 million wells •This is designed in such a way that only one bead will fit in each well
  25. 25. STEPS
  26. 26. CMOS- complementary metal oxide semiconductor
  27. 27. •Semi conductor chip contains millions of wells covered by millions of pixels. •Chip captures chemical information from DNA sequence & translate light to digital information •DNA is cleaved into millions fragments and is attached to its own bead flooded with DNA nucleotides •For each bonding hydrogen ion is released eg: G C and change the pH solution of well •The chemical change is read on chip by using ion sensitive layer beneath well •If nucleotide is not complementary to specific strand no ion is released eg: G T •This process occurs simultaneously in million wells
  28. 28. CHIP MACHINE
  29. 29. PacBio This is a Natural process
  30. 30. •Each SMRT(single molecule real time) has 10,000 zeromode waveguides •Step 1: DNA polymerase is immobilized at bottom •Step 2: phospholinked nucleotides are added •Step 3: each nucleotide is labelled with different coloured fluorophore •Step 4: base is detected such that light pulse is produced after incorporation up to 1000 fold.
  31. 31. Single molecule real time(SMRT) sequencing
  32. 32. STEPS
  33. 33. Instrument Pacbio Ion torrent 454 Illumina SOLiD Method Single molecule in real time Ion semiconductor Pyrosequencing Synthesis ligation Read length 3kb 200bp 700bp 50 to 250bp 50+35 OR 50+50bp Error type indel Indel Indel Substitution A-T bias Error % 13 ̃1 ̃0.1 ̃0.1 ̃0.1 Reads per run 35000-75000 Upto 4M 1M Upto 3.2G 1.2 to 1.4G Time/run 30 min in 2 h 2h 24h 1-10 days 1 to 2 wks Cost/million bases in $ 2 1 10 0.05 to 0.15 0.13 Advantages Longest read length & fast ↓ expensive & fast Long read size & fast ↑ sequence yield, cost, accuracy ↓ low cost per base disadvantages Low yield at ↑ accuracy. expensive Homopolymer errors Runs are expensive. homopolymer errors expensive Slower than other methods, read lengths, longevity of platform
  34. 34. equipment applications 454 Whole genome sequencing, resequencing, ,Metagenomics Ion torrent Small de novo genome sequencing Amplicon sequencing,Metagenomics Validation Illumina Small de novo genome sequencing,cytogenetic analysis, Metagenomics Validation Transcriptome sequencing (RNA-Seq) Whole Exome Sequencing Whole Genome Sequencing SOLiD Transcriptome sequencing (RNA-Seq) Whole Exome Sequencing Whole Genome Sequencing Pacific Biosciences Small genomes, Epigenomics
  35. 35. Assembly Problems  there are random mutations (either naturally occurring cell-to-cell variation or generated by PCR or cloning),  sometimes the cloning vector itself gets sequenced  most genomes contain multiple copies of many sequences  Getting rid of vector sequences is easy once the problem is recognized.  Repeat sequence DNA is very common in eukaryotes. High quality sequencing is helpful  Sequencing errors, bad data, random mutations,misreadings  Data produce in form of short reads  Short reads produced have low quality bases and vector/adaptor contaminations.  Several genome assemblers are available but we have to check the performance of them to search for best one.  Quality control  Patent and licensing restrictions
  36. 36. Short Reads 454 FLX Solid IlluminaIon torrent Low cost & Less time Genomic Fragments
  37. 37. BENEFITS  Treatments based on genomics  Improve outcomes  Faster diagnosis  More precise prognosis  Effective therapy  Reduce healthcare costs
  38. 38. Applications  Oncology  Determine the preferred therapeutic agent for each tumor  Ascertain which patients are most likely to benefit from a given therapy  Molecular Pathology  Disease-specific tests  Cost of a single test: $100 - $5,000  Individual test validation and performance  Medicine • human genomic sequence in public databases allows rapid identification of disease genes by positional cloning • Inherited Diseases can be identified

×