Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Creating a SNP calling pipeline

3,155 views

Published on

Creating a SNP calling pipeline in the context of the Potato Genome Sequencing Consortium project.

Published in: Education
  • Be the first to comment

Creating a SNP calling pipeline

  1. 1. Potato SNPsDan Bolser and David Martin Next Gen Bug, Dundee 01/18/2010 1
  2. 2. Aims of the work1) Learn about handling RNASeq  Create a SNP calling pipeline2) Select SNPs for genetic mapping  Using Illuminas GoldenGate SNP chip (OPA) 2
  3. 3. Creating a SNP calling pipeline 3
  4. 4. 4
  5. 5. Align (using BWA)1) Index the potato genome assemblybwa index [-a bwtsw|div|is] [-c] <in.fasta>2) Perform the alignmentbwa aln [options] <in.fasta> <in.fq>3) Output results in SAM format (single end)bwa samse <in.fasta> <in.sai> <in.fq> 5
  6. 6. Align (using Bowtie)1) Index the potato genome assemblybowtie-build [options] <in.fasta> <ebwt>2) Perform the alignment and output resultsbowtie [options] <ebwt> <in.fq>
  7. 7. 7
  8. 8. Convert (using SAMtools)1) Convert SAM to BAM for sortingsamtools view -S -b <in.sam>2) Sort BAM for SNP callingsamtools sort <in.bam> <out.bam.s> Alignments are both compressed for long termstorage and sorted for variant discovery. 8
  9. 9. 9
  10. 10. Coverage profiles / Depth vectors 10
  11. 11. SAMtools... Dump a coverage profilesamtools mpileup -f <in.fasta> <my.bam.s> P1 244526 A 10 ...,.,,,.. BBQa`aaaa[ P1 244527 A 10 ...,.,,,.. BBZ_`^a_a[ P1 244528 C 10 .$.$.,.,,,.. >>RaZ`aaaa P1 244529 C 8 .,.,,,.. NaXaaaa` P1 244530 T 8 .,.,,,.. Xa_aaa` P1 244531 C 8 .,.,,,.. Rbabbaa P1 244532 T 9 .,.,,,..^~. EE^^^^^^A P1 244533 T 9 .,.,,,... BBB P1 244534 T 9 .$,$.,,,... @@^^^^^^E 11
  12. 12. SAMtools Bio::DB::Sam (BioPerl)Dump a coverage profile 2 12
  13. 13. SAMtools Bio::DB::Sam (BioPerl)P41630Matches : 90233333333333345555555555 666778888888899999999999 999999999999999999999999 999976666666666665444444 44443332211111111000 13
  14. 14. 14
  15. 15. mpileup samtools mpileup collects summary information in the input BAMs, computes the likelihood of data given each possible genotype and stores the likelihoods in the BCF format. bcftools view applies the prior and does the actual calling. Finally, we filter. 15
  16. 16. SNP call1) Index the potato genome assembly (again!)samtools faidx in.fasta2) Run mpileup to generate VCF formatsamtools mpileup -ug -f in.fasta my1.bam.s my2.bam.s > my.raw.bcf Actually, all we did (I think) is perform a format conversion (BAM to VCF).
  17. 17. VCF format 17
  18. 18. VCF formatA standard format for sequence variation: SNPs, indels and structural variants.Compressed and indexed.Developed for the 1000 Genomes Project.VCFtools for VCF like SAMtools for SAM.Specification and tools available from http://vcftools.sourceforge.net 18
  19. 19. 19
  20. 20. SNP call and filter1) Call SNPsbcftools view -bvcg my.raw.bcf > my.var.bcf2) Filter SNPsbcftools view my.var.bcf | vcfutils.pl varFilter my.var.bcf > my.var.bcf.filt 20
  21. 21. 21
  22. 22. Aims of the work1) Learn about handling RNASeq  Create a SNP calling pipeline2) Select SNPs for genetic mapping  Using Illuminas GoldenGate SNP chip (OPA) 22
  23. 23. Select SNPs for genetic mapping Using Illuminas GoldenGate SNP chip (OPA) 23
  24. 24. SNP chip (OPA) construction A set of DM SNP positions was provided by the SolCAP project (RNASeq derived). A subset was selected for developing OPAs (Illumina’s SNP chip technology). OPAs were run, and results have now been compared to RNASeq. 24
  25. 25. Comparison (using an early SAMtools)
  26. 26. Comparison (using an early SAMtools)
  27. 27. 27
  28. 28. Comparison (using an early SAMtools)
  29. 29. Comparison (using new SAMtools)
  30. 30. Comparison (using new SAMtools)
  31. 31. Looking into the RNASeq data… 34
  32. 32. 35
  33. 33. Potato genome assembly RNASeq RNASeq read library read library 36
  34. 34. 37
  35. 35. 38
  36. 36. 39
  37. 37. 40
  38. 38. 41
  39. 39. A lot more questions to answer… Track down more ‘strange’ SNPs based on the expected AFS of the two samples. Go beyond bialleleic SNPs Check the OPA base... − Was the right base probed by the chip? 42
  40. 40. Thank you for your patience! 43
  41. 41. OPAs in 5 steps... The DNA sample is activated for binding to paramagnetic particles.
  42. 42. OPAs in 5 steps... Three oligos are designed for each SNP locus. Two are specific to each allele of the SNP site (ASO) and a Locus- Specific Oligo (LSO).
  43. 43. OPAs in 5 steps... Several wash steps remove excess and mis-hybridized oligos. Extension of the appropriate ASO and ligation to the LSO joins information about the genotype to the address sequence on the LSO.
  44. 44. OPAs in 5 steps... The single-stranded, dye-labeled DNAs are hybridized to their complement bead type through their unique address sequences.
  45. 45. OPAs in 5 steps... Key to the assay: Scalable, multiplexing sample preparation (one tube reaction). Highly parallel array- based read-out. High-quality data: Average call rates above 99% accuracy.

×