0
[MIT]<br />Introduction to 2GS data analysis<br />Drink faster !<br />June 23, 2011<br />
Production Informatics and Bioinformatics<br />June 23, 2011<br />Produce raw sequence reads<br />Basic Production<br />In...
First Generation: Sanger sequencing<br /><ul><li>Second Generation: amplified molecule sequencing </li></ul>Third Generati...
What steps are involved in sequencing ?<br />June 23, 2011<br />sequencing by synthesis (SBS) technology<br />Fragmentatio...
Illumina sequencing: Library + Amplification<br />June 23, 2011<br />“Illumina Sequencing Technology” booklet<br />
Illumina Sequencing: Synthesis + Imaging<br />June 23, 2011<br />“Illumina Sequencing Technology” booklet<br />
Output: 1.5 Terabyte of data<br />June 23, 2011<br />Inspired by anzska information booklet<br />
Sequencer Output Conversion: Production Informatics<br />1.5 TB data : 6 billion clusters with 100 bp reads <br />	= 600 b...
Multiplexing<br />6 billion reads:<br />750 million reads per lane<br />Currently 12-plex (soon 96-plex):<br />One run  <b...
Demultiplexing<br />June 23, 2011<br />CASAVA<br />…<br />…<br />× samples<br />× read length<br />visualpharm.com<br />
CASAVA1.8.0 program call<br />June 23, 2011<br />configureBclToFastq.pl <br />	--input-dir Data/Intensities/BaseCalls/ <br...
Fastq files<br />June 23, 2011<br />@HWI-ST301_0112:1:1:1169:2044#0/1<br />CCATAAGGCCACGTATTTTGCAAGCTATTTAACTGGCGGCGAT<br ...
Fastq – PHRED quality<br />Pathological<br />June 23, 2011<br />
Fastq: Quality control<br />Base-pair quality score <br />Adapter contamination<br />Uneven Amplification <br />June 23, 2...
Three things to remember<br />Don’t be fooled by marketing<br />Fastqfiles are not directly usable<br />Basic-run QC can b...
Next Week:<br />June 23, 2011<br />Abstract: This session will focus on identifying SNPs from whole genome, exome capture ...
Walk-in-clinic<br />June 23, 2011<br />
First Generation: Sanger sequencing<br /><ul><li>Second Generation: amplified molecule sequencing </li></ul>Third Generati...
Helicos<br />true Single Molecule Sequencing(tSMS)™ technology<br />Sequencing by synthesis but much more sensitive so no ...
Life Technology - Ion Torrent<br />Hydrogen Ion is released by the incorporation of a nucleotide, which is measured by a s...
PacBio<br />Immobilized polymerase at the bottom of a well<br />Fluorescent nucleotides float around and if they are incor...
Nanopore<br />Molecule is sucked through a poor and the change in the membrane charge due to the different nucleotides is ...
Upcoming SlideShare
Loading in...5
×

Introduction to second generation sequencing

3,276

Published on

An introduction to second generation sequencing will be given with focus on the basic production informatics: The approach of raw data conversion and quality control will be discussed.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,276
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
161
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • http://2.bp.blogspot.com/_BPr6hpMG0tg/TSZdkYDcRvI/AAAAAAAAAjY/ReScIkWNySg/s1600/drink.jpg
  • PCR where a labeled nucleotide is incorporated at random that terminates the PCR reaction. These fragments of different length are then separated on a gel and the sequence can be manually read from the labeled end nucleotides.
  • Some of you have done some library prep already so you have a feel for how realistic 3h10 min are for this. This seminar goes through the analysis steps that are required to answer the question the data was generated for. So by the end of this seminar series you’ll have also a feel for how realistic 30 minutes is for the data analysis.
  • PCR where a labeled nucleotide is incorporated at random that terminates the PCR reaction. These fragments of different length are then separated on a gel and the sequence can be manually read from the labeled end nucleotides.
  • http://www.helicosbio.com/Technology/TrueSingleMoleculeSequencing/tabid/64/Default.aspx
  • http://www.nanoporetech.com/sections/index/82
  • Transcript of "Introduction to second generation sequencing"

    1. 1. [MIT]<br />Introduction to 2GS data analysis<br />Drink faster !<br />June 23, 2011<br />
    2. 2. Production Informatics and Bioinformatics<br />June 23, 2011<br />Produce raw sequence reads<br />Basic Production<br />Informatics<br />Map to genome and generate raw genomic features (e.g. SNPs)<br />Advanced <br />Production Inform.<br />Analyze the data; Uncover the biological meaning<br />Bioinformatics<br />Research<br />Per one-flowcell project<br />
    3. 3. First Generation: Sanger sequencing<br /><ul><li>Second Generation: amplified molecule sequencing </li></ul>Third Generation: single molecule sequencing<br />Brief history of sequencing <br />June 23, 2011<br />*<br />*<br />* Discussion about category<br />
    4. 4. What steps are involved in sequencing ?<br />June 23, 2011<br />sequencing by synthesis (SBS) technology<br />Fragmentation<br />Library generation<br />Amplification<br />Sequencing<br />Analysis<br />Illumina Marketing: <br />“3h 10 minutes wet-lab<br />30 minutes dry lab”<br />
    5. 5. Illumina sequencing: Library + Amplification<br />June 23, 2011<br />“Illumina Sequencing Technology” booklet<br />
    6. 6. Illumina Sequencing: Synthesis + Imaging<br />June 23, 2011<br />“Illumina Sequencing Technology” booklet<br />
    7. 7. Output: 1.5 Terabyte of data<br />June 23, 2011<br />Inspired by anzska information booklet<br />
    8. 8. Sequencer Output Conversion: Production Informatics<br />1.5 TB data : 6 billion clusters with 100 bp reads <br /> = 600 billion data points <br />June 23, 2011<br />HiSeq<br />CASAVA<br />…<br />× read length<br />For HiSeq: images are converted to flat files (*.bcl or *.cif) <br />visualpharm.com<br />Maysoft<br />
    9. 9. Multiplexing<br />6 billion reads:<br />750 million reads per lane<br />Currently 12-plex (soon 96-plex):<br />One run <br />June 23, 2011<br />Oliver Twardowski<br />
    10. 10. Demultiplexing<br />June 23, 2011<br />CASAVA<br />…<br />…<br />× samples<br />× read length<br />visualpharm.com<br />
    11. 11. CASAVA1.8.0 program call<br />June 23, 2011<br />configureBclToFastq.pl <br /> --input-dir Data/Intensities/BaseCalls/ <br /> -output-dir Data/Unaligned <br /> --sample-sheet SampleSheet.csv <br /> --use-bases-mask y100,I6nn,Y100 >file.log 2>&1<br />cd Data/Unaligned<br />qsub -pe make 16 -jy -v $MYPATH –oqsub.out -cwd –N fastq -by <br /> make -j 16<br />Runtime: ~ 6h<br />
    12. 12. Fastq files<br />June 23, 2011<br />@HWI-ST301_0112:1:1:1169:2044#0/1<br />CCATAAGGCCACGTATTTTGCAAGCTATTTAACTGGCGGCGAT<br />+HWI-ST301_0112:1:1:1169:2044#0/1<br />dddcdd^dd`acacdacd`ecdedabdcdddcc```bTa<br />36 36 36 35 28 …<br />ASCII @ .. ~<br />DEC 64 .. 126<br />PHRED 0 .. 62<br />Phred scores are estimates only !<br /> Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 2010 Apr;38(6):1767-71. PMID:20015970<br />
    13. 13. Fastq – PHRED quality<br />Pathological<br />June 23, 2011<br />
    14. 14. Fastq: Quality control<br />Base-pair quality score <br />Adapter contamination<br />Uneven Amplification <br />June 23, 2011<br />
    15. 15. Three things to remember<br />Don’t be fooled by marketing<br />Fastqfiles are not directly usable<br />Basic-run QC can be made from fastq file<br />June 23, 2011<br />“All modern genomics projects are now bottlenecked at the stage of data analysis rather than data production”<br /> Ewan Birney<br /> European Bioinformatics Institute<br />Wellcome Trust <br />David S. Roos Bioinformatics--Trying to Swim in a Sea of Data;Science 16 February 2001: Vol. 291 no. 5507 pp. 1260-1261 DOI: 10.1126/science.291.5507.1260 <br />
    16. 16. Next Week:<br />June 23, 2011<br />Abstract: This session will focus on identifying SNPs from whole genome, exome capture or targeted resequencing data. The approaches of mapping, local realigment, recalibration, SNP calling, and SNP recalibration will be introduced and quality metrics discussed. <br />
    17. 17. Walk-in-clinic<br />June 23, 2011<br />
    18. 18. First Generation: Sanger sequencing<br /><ul><li>Second Generation: amplified molecule sequencing </li></ul>Third Generation: single molecule sequencing<br />Brief history of sequencing <br />June 23, 2011<br />*<br />*<br />* Discussion about category<br />
    19. 19. Helicos<br />true Single Molecule Sequencing(tSMS)™ technology<br />Sequencing by synthesis but much more sensitive so no amplification<br />June 23, 2011<br />
    20. 20. Life Technology - Ion Torrent<br />Hydrogen Ion is released by the incorporation of a nucleotide, which is measured by a semiconductor<br />Depending on which nucleotide wash cycle the signal coincides<br />June 23, 2011<br />
    21. 21. PacBio<br />Immobilized polymerase at the bottom of a well<br />Fluorescent nucleotides float around and if they are incorporated they are held still for tens of milliseconds, which is the signal that is recorded<br />No upper limit on the length <br />June 23, 2011<br />http://www.pacificbiosciences.com/smrt-biology/smrt-technology?page=4<br />
    22. 22. Nanopore<br />Molecule is sucked through a poor and the change in the membrane charge due to the different nucleotides is recorded.<br />June 23, 2011<br />http://www.nanoporetech.com/sections/index/82<br />
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×