SlideShare a Scribd company logo
1 of 49
Potato SNPs


Dan Bolser and David Martin

  Next Gen Bug, Dundee
       01/18/2010



                        1
Aims of the work
1) Learn about handling RNASeq
     
         Create a SNP calling pipeline


2) Select SNPs for genetic mapping
     
         Using Illumina's GoldenGate SNP chip (OPA)




                                         2
Creating a SNP calling pipeline




                       3
4
Align (using BWA)
1) Index the potato genome assembly
bwa index [-a bwtsw|div|is]             [-c]
 <in.fasta>
2) Perform the alignment
bwa aln [options] <in.fasta>
 <in.fq>
3) Output results in SAM format (single end)
bwa samse <in.fasta> <in.sai>
 <in.fq>                  5
Align (using Bowtie)
1) Index the potato genome assembly
bowtie-build [options] <in.fasta>
  <ebwt>
2) Perform the alignment and output results
bowtie [options] <ebwt> <in.fq>
7
Convert (using SAMtools)
1) Convert SAM to BAM for sorting
samtools view -S -b <in.sam>
2) Sort BAM for SNP calling
samtools sort <in.bam> <out.bam.s>


  Alignments are both compressed for long term
storage and sorted for variant discovery.

                                    8
9
Coverage profiles /
  Depth vectors



                 10
SAMtools...

    Dump a coverage profile
samtools mpileup -f <in.fasta>
 <my.bam.s>
    P1   244526   A   10   ...,.,,,..      BBQa`aaaa[
    P1   244527   A   10   ...,.,,,..      BBZ_`^a_a[
    P1   244528   C   10   .$.$.,.,,,..    >>RaZ`aaaa
    P1   244529   C    8   .,.,,,..        NaXaaaa`
    P1   244530   T    8   .,.,,,..        Xa_aaa`
    P1   244531   C    8   .,.,,,..        Rbabbaa
    P1   244532   T    9   .,.,,,..^~.     EE^^^^^^A
    P1   244533   T    9   .,.,,,...       BBB
    P1   244534   T    9   .$,$.,,,...     @@^^^^^^E

                                          11
SAMtools Bio::DB::Sam (BioPerl)
Dump a coverage
 profile 2




                       12
SAMtools Bio::DB::Sam (BioPerl)
P41630
Matches : 9
0233333333333345555555555
 666778888888899999999999
 999999999999999999999999
 999976666666666665444444
 44443332211111111000

                        13
14
mpileup

    samtools mpileup collects summary
    information in the input BAMs, computes the
    likelihood of data given each possible
    genotype and stores the likelihoods in the
    BCF format.

    bcftools view applies the prior and does the
    actual calling.

    Finally, we filter.
                                    15
SNP call
1) Index the potato genome assembly (again!)
samtools faidx in.fasta
2) Run 'mpileup' to generate VCF format
samtools mpileup -ug -f in.fasta
  my1.bam.s my2.bam.s > my.raw.bcf

    Actually, all we did (I think) is perform a
    format conversion (BAM to VCF).
VCF format




             17
VCF format
A standard format for sequence variation:
  SNPs, indels and structural variants.
Compressed and indexed.
Developed for the 1000 Genomes Project.
VCFtools for VCF like SAMtools for SAM.
Specification and tools available from
 http://vcftools.sourceforge.net
                                    18
19
SNP call and filter
1) Call SNPs
bcftools view -bvcg my.raw.bcf >
 my.var.bcf
2) Filter SNPs
bcftools view my.var.bcf |
 vcfutils.pl varFilter my.var.bcf
 > my.var.bcf.filt


                             20
21
Aims of the work
1) Learn about handling RNASeq
     
         Create a SNP calling pipeline


2) Select SNPs for genetic mapping
     
         Using Illumina's GoldenGate SNP chip (OPA)




                                         22
Select SNPs for genetic mapping
 Using Illumina's GoldenGate SNP chip (OPA)




                                23
SNP chip (OPA) construction

    A set of DM SNP positions was provided by
    the SolCAP project (RNASeq derived).

    A subset was selected for developing OPAs
    (Illumina’s SNP chip technology).

    OPAs were run, and results have now been
    compared to RNASeq.


                                   24
Comparison (using an early SAMtools)
Comparison (using an early SAMtools)
27
Comparison (using an early SAMtools)
Comparison (using new SAMtools)
Comparison (using new SAMtools)
Looking into the RNASeq data…




                      34
35
Potato genome
  assembly




      RNASeq          RNASeq
     read library    read library




                    36
37
38
39
40
41
A lot more questions to answer…

    Track down more ‘strange’ SNPs based on
    the expected AFS of the two samples.

    Go beyond bialleleic SNPs

    Check the OPA base...
    −   Was the right base probed by the chip?




                                          42
Thank you for your patience!




                      43
OPAs in 5 steps...
         The DNA sample is
          activated for binding
          to paramagnetic
          particles.
OPAs in 5 steps...
         Three oligos are
          designed for each
          SNP locus. Two are
          specific to each allele
          of the SNP site
          (ASO) and a Locus-
          Specific Oligo (LSO).
OPAs in 5 steps...
        Several wash steps
         remove excess and
         mis-hybridized oligos.
        Extension of the
         appropriate ASO and
         ligation to the LSO joins
         information about the
         genotype to the
         address sequence on
         the LSO.
OPAs in 5 steps...
         The single-stranded,
          dye-labeled DNAs
          are hybridized to
          their complement
          bead type through
          their unique address
          sequences.
OPAs in 5 steps...
         Key to the assay:
         Scalable, multiplexing
          sample preparation
          (one tube reaction).
         Highly parallel array-
           based read-out.
         High-quality data:
           Average call rates
           above 99% accuracy.

More Related Content

What's hot

De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...Torsten Seemann
 
BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS
 
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...VHIR Vall d’Hebron Institut de Recerca
 
Next generation sequencing methods
Next generation sequencing methods Next generation sequencing methods
Next generation sequencing methods Mrinal Vashisth
 
The Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentThe Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentRutger Vos
 
Secondary structure prediction
Secondary structure predictionSecondary structure prediction
Secondary structure predictionsamantlalit
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisdrelamuruganvet
 
Systems biology & Approaches of genomics and proteomics
 Systems biology & Approaches of genomics and proteomics Systems biology & Approaches of genomics and proteomics
Systems biology & Approaches of genomics and proteomicssonam786
 
SNP Detection Methods and applications
SNP Detection Methods and applications SNP Detection Methods and applications
SNP Detection Methods and applications Aneela Rafiq
 

What's hot (20)

Rna seq pipeline
Rna seq pipelineRna seq pipeline
Rna seq pipeline
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
 
BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1
 
Snp genotyping
Snp genotypingSnp genotyping
Snp genotyping
 
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
 
Whole exome sequencing(wes)
Whole exome sequencing(wes)Whole exome sequencing(wes)
Whole exome sequencing(wes)
 
Protein-protein interaction networks
Protein-protein interaction networksProtein-protein interaction networks
Protein-protein interaction networks
 
Protein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on RosettaProtein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on Rosetta
 
Microarray
MicroarrayMicroarray
Microarray
 
Next generation sequencing methods
Next generation sequencing methods Next generation sequencing methods
Next generation sequencing methods
 
The Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentThe Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environment
 
Secondary structure prediction
Secondary structure predictionSecondary structure prediction
Secondary structure prediction
 
Pymol
PymolPymol
Pymol
 
Genome analysis2
Genome analysis2Genome analysis2
Genome analysis2
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
Systems biology & Approaches of genomics and proteomics
 Systems biology & Approaches of genomics and proteomics Systems biology & Approaches of genomics and proteomics
Systems biology & Approaches of genomics and proteomics
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGNEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCING
 
Network components and biological network construction methods
Network components and biological network construction methodsNetwork components and biological network construction methods
Network components and biological network construction methods
 
SNP Detection Methods and applications
SNP Detection Methods and applications SNP Detection Methods and applications
SNP Detection Methods and applications
 

Viewers also liked

20-Line Lifesavers: Coding simple solutions in the GATK
20-Line Lifesavers: Coding simple solutions in the GATK20-Line Lifesavers: Coding simple solutions in the GATK
20-Line Lifesavers: Coding simple solutions in the GATKDan Bolser
 
Ensembl Plants: Visualising, mining and analysing crop genomics data
Ensembl Plants: Visualising, mining and analysing crop  genomics dataEnsembl Plants: Visualising, mining and analysing crop  genomics data
Ensembl Plants: Visualising, mining and analysing crop genomics dataDan Bolser
 
Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Denis C. Bauer
 
Press Release Vietnam -Vietnamese
Press Release Vietnam -VietnamesePress Release Vietnam -Vietnamese
Press Release Vietnam -VietnameseLe Thuy Hanh
 
IBM SaaS Complete A Questionnaire
IBM SaaS Complete A QuestionnaireIBM SaaS Complete A Questionnaire
IBM SaaS Complete A QuestionnaireChris Sparshott
 
Appearances do matter leadership in a crisis
Appearances do matter leadership in a crisisAppearances do matter leadership in a crisis
Appearances do matter leadership in a crisisJane Jordan-Meier
 
Chuong 1 tu bat on vi mo den con duong tai co cau
Chuong 1   tu bat on vi mo den con duong tai co cauChuong 1   tu bat on vi mo den con duong tai co cau
Chuong 1 tu bat on vi mo den con duong tai co cauLe Thuy Hanh
 
Building Your Personal Brand with Social Media
Building Your Personal Brand with Social MediaBuilding Your Personal Brand with Social Media
Building Your Personal Brand with Social MediaErin Dorney
 
Workshop social networking 09
Workshop social networking 09Workshop social networking 09
Workshop social networking 09Le Thuy Hanh
 
IBM SaaS Upload And Share A File
IBM SaaS Upload And Share A FileIBM SaaS Upload And Share A File
IBM SaaS Upload And Share A FileChris Sparshott
 
wchh2014 Wordpress ChildThemes - wieso, weshalb, warum?
wchh2014 Wordpress ChildThemes - wieso, weshalb, warum?wchh2014 Wordpress ChildThemes - wieso, weshalb, warum?
wchh2014 Wordpress ChildThemes - wieso, weshalb, warum?Martina Rüdiger
 
IR 2.0: media społecznościowe w relacjach inwestorskich
IR 2.0: media społecznościowe w relacjach inwestorskichIR 2.0: media społecznościowe w relacjach inwestorskich
IR 2.0: media społecznościowe w relacjach inwestorskichPiotr Biernacki
 
Luxury Real Estate Stats 4 26
Luxury Real Estate Stats 4 26Luxury Real Estate Stats 4 26
Luxury Real Estate Stats 4 26njhousehelper
 
DWI_Introduction Material_ver.01 (2)
DWI_Introduction Material_ver.01 (2)DWI_Introduction Material_ver.01 (2)
DWI_Introduction Material_ver.01 (2)Mohit Singh
 
TiếP Thị Số HướNg DẫNthiếT YếU Cho
TiếP Thị Số   HướNg DẫNthiếT YếU ChoTiếP Thị Số   HướNg DẫNthiếT YếU Cho
TiếP Thị Số HướNg DẫNthiếT YếU ChoLe Thuy Hanh
 
BioWikis BSB10
BioWikis BSB10BioWikis BSB10
BioWikis BSB10Dan Bolser
 
Manifesto Dos EmpresáRios
Manifesto Dos EmpresáRiosManifesto Dos EmpresáRios
Manifesto Dos EmpresáRiosFabricio Martins
 

Viewers also liked (20)

20-Line Lifesavers: Coding simple solutions in the GATK
20-Line Lifesavers: Coding simple solutions in the GATK20-Line Lifesavers: Coding simple solutions in the GATK
20-Line Lifesavers: Coding simple solutions in the GATK
 
Ensembl Plants: Visualising, mining and analysing crop genomics data
Ensembl Plants: Visualising, mining and analysing crop  genomics dataEnsembl Plants: Visualising, mining and analysing crop  genomics data
Ensembl Plants: Visualising, mining and analysing crop genomics data
 
Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1
 
SNp mining in crops
SNp mining in cropsSNp mining in crops
SNp mining in crops
 
Press Release Vietnam -Vietnamese
Press Release Vietnam -VietnamesePress Release Vietnam -Vietnamese
Press Release Vietnam -Vietnamese
 
Cloud Computing and ROI
Cloud Computing and ROICloud Computing and ROI
Cloud Computing and ROI
 
IBM SaaS Complete A Questionnaire
IBM SaaS Complete A QuestionnaireIBM SaaS Complete A Questionnaire
IBM SaaS Complete A Questionnaire
 
Appearances do matter leadership in a crisis
Appearances do matter leadership in a crisisAppearances do matter leadership in a crisis
Appearances do matter leadership in a crisis
 
Chuong 1 tu bat on vi mo den con duong tai co cau
Chuong 1   tu bat on vi mo den con duong tai co cauChuong 1   tu bat on vi mo den con duong tai co cau
Chuong 1 tu bat on vi mo den con duong tai co cau
 
Building Your Personal Brand with Social Media
Building Your Personal Brand with Social MediaBuilding Your Personal Brand with Social Media
Building Your Personal Brand with Social Media
 
Workshop social networking 09
Workshop social networking 09Workshop social networking 09
Workshop social networking 09
 
IBM SaaS Upload And Share A File
IBM SaaS Upload And Share A FileIBM SaaS Upload And Share A File
IBM SaaS Upload And Share A File
 
wchh2014 Wordpress ChildThemes - wieso, weshalb, warum?
wchh2014 Wordpress ChildThemes - wieso, weshalb, warum?wchh2014 Wordpress ChildThemes - wieso, weshalb, warum?
wchh2014 Wordpress ChildThemes - wieso, weshalb, warum?
 
IR 2.0: media społecznościowe w relacjach inwestorskich
IR 2.0: media społecznościowe w relacjach inwestorskichIR 2.0: media społecznościowe w relacjach inwestorskich
IR 2.0: media społecznościowe w relacjach inwestorskich
 
Luxury Real Estate Stats 4 26
Luxury Real Estate Stats 4 26Luxury Real Estate Stats 4 26
Luxury Real Estate Stats 4 26
 
DWI_Introduction Material_ver.01 (2)
DWI_Introduction Material_ver.01 (2)DWI_Introduction Material_ver.01 (2)
DWI_Introduction Material_ver.01 (2)
 
TiếP Thị Số HướNg DẫNthiếT YếU Cho
TiếP Thị Số   HướNg DẫNthiếT YếU ChoTiếP Thị Số   HướNg DẫNthiếT YếU Cho
TiếP Thị Số HướNg DẫNthiếT YếU Cho
 
BioWikis BSB10
BioWikis BSB10BioWikis BSB10
BioWikis BSB10
 
Manifesto Dos EmpresáRios
Manifesto Dos EmpresáRiosManifesto Dos EmpresáRios
Manifesto Dos EmpresáRios
 
Questions
QuestionsQuestions
Questions
 

Similar to Potato SNP Calling and Genetic Mapping

20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_coursehansjansen9999
 
Using BioNano Maps to Improve an Insect Genome Assembly​
Using BioNano Maps to Improve an Insect Genome Assembly​Using BioNano Maps to Improve an Insect Genome Assembly​
Using BioNano Maps to Improve an Insect Genome Assembly​Jennifer Shelton
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pubsesejun
 
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...fruitbreedomics
 
Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Monica Munoz-Torres
 
07 wp6 progresses&results-20130221
07 wp6 progresses&results-2013022107 wp6 progresses&results-20130221
07 wp6 progresses&results-20130221fruitbreedomics
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGScursoNGS
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSHAMNAHAMNA8
 
20110524zurichngs 2nd pub
20110524zurichngs 2nd pub20110524zurichngs 2nd pub
20110524zurichngs 2nd pubsesejun
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataThomas Keane
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataThomas Keane
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionJatinder Singh
 
The ‘Three Peak Challenge’ for long-read, ultra-deep stool metagenomics on th...
The ‘Three Peak Challenge’ for long-read, ultra-deep stool metagenomics on th...The ‘Three Peak Challenge’ for long-read, ultra-deep stool metagenomics on th...
The ‘Three Peak Challenge’ for long-read, ultra-deep stool metagenomics on th...scalene
 

Similar to Potato SNP Calling and Genetic Mapping (20)

Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
Using BioNano Maps to Improve an Insect Genome Assembly​
Using BioNano Maps to Improve an Insect Genome Assembly​Using BioNano Maps to Improve an Insect Genome Assembly​
Using BioNano Maps to Improve an Insect Genome Assembly​
 
Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
 
Hong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptxHong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptx
 
Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing
 
07 wp6 progresses&results-20130221
07 wp6 progresses&results-2013022107 wp6 progresses&results-20130221
07 wp6 progresses&results-20130221
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGS
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 
20110524zurichngs 2nd pub
20110524zurichngs 2nd pub20110524zurichngs 2nd pub
20110524zurichngs 2nd pub
 
2.CRISPR .pptx
2.CRISPR .pptx2.CRISPR .pptx
2.CRISPR .pptx
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
Fish546
Fish546Fish546
Fish546
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential Expression
 
Hamas 1
Hamas 1Hamas 1
Hamas 1
 
The ‘Three Peak Challenge’ for long-read, ultra-deep stool metagenomics on th...
The ‘Three Peak Challenge’ for long-read, ultra-deep stool metagenomics on th...The ‘Three Peak Challenge’ for long-read, ultra-deep stool metagenomics on th...
The ‘Three Peak Challenge’ for long-read, ultra-deep stool metagenomics on th...
 

More from Dan Bolser

Ramona Tăme - Email Encryption and Digital SIgning
Ramona Tăme - Email Encryption and Digital SIgningRamona Tăme - Email Encryption and Digital SIgning
Ramona Tăme - Email Encryption and Digital SIgningDan Bolser
 
Nice 2012, BioWikis and DASWiki
Nice 2012, BioWikis and DASWikiNice 2012, BioWikis and DASWiki
Nice 2012, BioWikis and DASWikiDan Bolser
 
Ensembl plants hsf_d_bolser_2012
Ensembl plants hsf_d_bolser_2012Ensembl plants hsf_d_bolser_2012
Ensembl plants hsf_d_bolser_2012Dan Bolser
 
NETTAB 2012 flyer
NETTAB 2012 flyerNETTAB 2012 flyer
NETTAB 2012 flyerDan Bolser
 
Semantic MediaWiki Workshop
Semantic MediaWiki WorkshopSemantic MediaWiki Workshop
Semantic MediaWiki WorkshopDan Bolser
 
Wikipedia and the Global Brain
Wikipedia and the Global BrainWikipedia and the Global Brain
Wikipedia and the Global BrainDan Bolser
 

More from Dan Bolser (7)

Ramona Tăme - Email Encryption and Digital SIgning
Ramona Tăme - Email Encryption and Digital SIgningRamona Tăme - Email Encryption and Digital SIgning
Ramona Tăme - Email Encryption and Digital SIgning
 
Nice 2012, BioWikis and DASWiki
Nice 2012, BioWikis and DASWikiNice 2012, BioWikis and DASWiki
Nice 2012, BioWikis and DASWiki
 
Ensembl plants hsf_d_bolser_2012
Ensembl plants hsf_d_bolser_2012Ensembl plants hsf_d_bolser_2012
Ensembl plants hsf_d_bolser_2012
 
NETTAB 2012 flyer
NETTAB 2012 flyerNETTAB 2012 flyer
NETTAB 2012 flyer
 
Semantic MediaWiki Workshop
Semantic MediaWiki WorkshopSemantic MediaWiki Workshop
Semantic MediaWiki Workshop
 
Wikis at work
Wikis at workWikis at work
Wikis at work
 
Wikipedia and the Global Brain
Wikipedia and the Global BrainWikipedia and the Global Brain
Wikipedia and the Global Brain
 

Recently uploaded

ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleCeline George
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 

Recently uploaded (20)

ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young minds
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP Module
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 

Potato SNP Calling and Genetic Mapping

  • 1. Potato SNPs Dan Bolser and David Martin Next Gen Bug, Dundee 01/18/2010 1
  • 2. Aims of the work 1) Learn about handling RNASeq  Create a SNP calling pipeline 2) Select SNPs for genetic mapping  Using Illumina's GoldenGate SNP chip (OPA) 2
  • 3. Creating a SNP calling pipeline 3
  • 4. 4
  • 5. Align (using BWA) 1) Index the potato genome assembly bwa index [-a bwtsw|div|is] [-c] <in.fasta> 2) Perform the alignment bwa aln [options] <in.fasta> <in.fq> 3) Output results in SAM format (single end) bwa samse <in.fasta> <in.sai> <in.fq> 5
  • 6. Align (using Bowtie) 1) Index the potato genome assembly bowtie-build [options] <in.fasta> <ebwt> 2) Perform the alignment and output results bowtie [options] <ebwt> <in.fq>
  • 7. 7
  • 8. Convert (using SAMtools) 1) Convert SAM to BAM for sorting samtools view -S -b <in.sam> 2) Sort BAM for SNP calling samtools sort <in.bam> <out.bam.s>  Alignments are both compressed for long term storage and sorted for variant discovery. 8
  • 9. 9
  • 10. Coverage profiles / Depth vectors 10
  • 11. SAMtools...  Dump a coverage profile samtools mpileup -f <in.fasta> <my.bam.s> P1 244526 A 10 ...,.,,,.. BBQa`aaaa[ P1 244527 A 10 ...,.,,,.. BBZ_`^a_a[ P1 244528 C 10 .$.$.,.,,,.. >>RaZ`aaaa P1 244529 C 8 .,.,,,.. NaXaaaa` P1 244530 T 8 .,.,,,.. Xa_aaa` P1 244531 C 8 .,.,,,.. Rbabbaa P1 244532 T 9 .,.,,,..^~. EE^^^^^^A P1 244533 T 9 .,.,,,... BBB P1 244534 T 9 .$,$.,,,... @@^^^^^^E 11
  • 12. SAMtools Bio::DB::Sam (BioPerl) Dump a coverage profile 2 12
  • 13. SAMtools Bio::DB::Sam (BioPerl) P41630 Matches : 9 0233333333333345555555555 666778888888899999999999 999999999999999999999999 999976666666666665444444 44443332211111111000 13
  • 14. 14
  • 15. mpileup  samtools mpileup collects summary information in the input BAMs, computes the likelihood of data given each possible genotype and stores the likelihoods in the BCF format.  bcftools view applies the prior and does the actual calling.  Finally, we filter. 15
  • 16. SNP call 1) Index the potato genome assembly (again!) samtools faidx in.fasta 2) Run 'mpileup' to generate VCF format samtools mpileup -ug -f in.fasta my1.bam.s my2.bam.s > my.raw.bcf  Actually, all we did (I think) is perform a format conversion (BAM to VCF).
  • 18. VCF format A standard format for sequence variation: SNPs, indels and structural variants. Compressed and indexed. Developed for the 1000 Genomes Project. VCFtools for VCF like SAMtools for SAM. Specification and tools available from http://vcftools.sourceforge.net 18
  • 19. 19
  • 20. SNP call and filter 1) Call SNPs bcftools view -bvcg my.raw.bcf > my.var.bcf 2) Filter SNPs bcftools view my.var.bcf | vcfutils.pl varFilter my.var.bcf > my.var.bcf.filt 20
  • 21. 21
  • 22. Aims of the work 1) Learn about handling RNASeq  Create a SNP calling pipeline 2) Select SNPs for genetic mapping  Using Illumina's GoldenGate SNP chip (OPA) 22
  • 23. Select SNPs for genetic mapping Using Illumina's GoldenGate SNP chip (OPA) 23
  • 24. SNP chip (OPA) construction  A set of DM SNP positions was provided by the SolCAP project (RNASeq derived).  A subset was selected for developing OPAs (Illumina’s SNP chip technology).  OPAs were run, and results have now been compared to RNASeq. 24
  • 25. Comparison (using an early SAMtools)
  • 26. Comparison (using an early SAMtools)
  • 27. 27
  • 28.
  • 29. Comparison (using an early SAMtools)
  • 31.
  • 32.
  • 34. Looking into the RNASeq data… 34
  • 35. 35
  • 36. Potato genome assembly RNASeq RNASeq read library read library 36
  • 37. 37
  • 38. 38
  • 39. 39
  • 40. 40
  • 41. 41
  • 42. A lot more questions to answer…  Track down more ‘strange’ SNPs based on the expected AFS of the two samples.  Go beyond bialleleic SNPs  Check the OPA base... − Was the right base probed by the chip? 42
  • 43. Thank you for your patience! 43
  • 44.
  • 45. OPAs in 5 steps... The DNA sample is activated for binding to paramagnetic particles.
  • 46. OPAs in 5 steps... Three oligos are designed for each SNP locus. Two are specific to each allele of the SNP site (ASO) and a Locus- Specific Oligo (LSO).
  • 47. OPAs in 5 steps... Several wash steps remove excess and mis-hybridized oligos. Extension of the appropriate ASO and ligation to the LSO joins information about the genotype to the address sequence on the LSO.
  • 48. OPAs in 5 steps... The single-stranded, dye-labeled DNAs are hybridized to their complement bead type through their unique address sequences.
  • 49. OPAs in 5 steps... Key to the assay: Scalable, multiplexing sample preparation (one tube reaction). Highly parallel array- based read-out. High-quality data: Average call rates above 99% accuracy.

Editor's Notes

  1. All three oligo sequences contain regions of genomic complementarity and universal PCR primer sites; the LSO also contains a unique address sequence that targets a particular bead type. Up to 1,536 SNPs may be interrogated simultaneously in this manner. During the primer hybridization process, the assay oligos hybridize to the genomic DNA sample bound to paramagnetic particles. Because hybridization occurs prior to any amplification steps, no amplification bias can be introduced into the assay.
  2. Extension of the appropriate ASO and ligation of the extended product to the LSO joins information about the genotype present at the SNP site to the address sequence on the LSO Allele-specific primer extension (ASPE). This step is used to preferentially extend the correctly matched ASO (at the 3&apos; end) up to the 5&apos; end of the LSO primer.
  3. One to one mapping between an address sequence on the array and the locus being scored. As a result of this labeling scheme, the PCR product consists of double stranded DNA of which one strand, containing the complement to the Illumicode, is labeled with either Cy3 or Cy5 in an allele specific manner, and a complementary strand labeled with biotin. The biotinylated strand is removed and the single, florescently labeled strand hybridized to the BeadArray.