Genotyping by Sequencing


Published on

Genotyping by Sequencing is a robust,fast and cheap approach for high throughput marker discovery.It has applications in crop improvement programs by enhancing identification of superior genotypes.

Published in: Education, Technology
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Genotyping by Sequencing

  1. 1. Presenter : Lovejot
  2. 2.  Methods for high-throughput marker discovery. Genotyping by sequencing strategy. Bioinformatics Pipeline. Need for modifying Genotyping-by-Sequencing. Applications of Genotyping-by-Sequencing.Brief Outline
  3. 3.
  4. 4. TraditionalMarkerDiscoveryCostly and can not beparallelizedTime Consuming Cloningand Primer DesigningScoring :Expensive andLabourious
  5. 5. NGSBasedMarkerDiscoveryDiscovering, Sequencingand Genotyping of largenumber of markersFASTParallelized LibraryPreparation
  6. 6. Sequencing is rapidly becoming so inexpensive that it will soon be reasonable to use it forevery genetic study(Poland et al.,2012)
  7. 7. Enrichment strategies• Long range PCR amplification (using molecular inversion probes)• DNA hybridization/sequence capture methodsTime-consuming, technologically challenging, and can be cost-prohibitive for assayinglarge numbers of samples.Complexity Reduction Using Restriction Enzymes Easy, Quick Extremely specific Highly reproducible reach important regions of the genome (inaccessible to sequencecapture approaches) Repetitive regions of genomes can be avoided and lower copy regions can be targetedwith two to three fold higher efficiency Simplifies computationally challenging alignment problems in species with high levelsof genetic diversity.
  8. 8. Reduced-representation sequencing Reduced-representation libraries (RRLs)/CRoPS. Restriction-site-associated DNA sequencing (RAD-seq). Multiplexed Shotgun Sequencing. Genotyping based Sequencing. Model organisms with high-quality reference genome sequences Non-Model species with no existing genomic dataMethods for high-throughput markerdiscovery using NGS(Davey et al.,2011)
  9. 9. Comparison of current genotyping methods usingnext-generation sequencing(Poland et al.,2012)
  10. 10. ( Davey et al.,2011)GBS provides many advantages. Itoffers a much simplified librarypreparation procedure that canbe performed with small amountsof starting DNA (100–200 ng) andis amenable to a high level ofmultiplexing.
  11. 11. MARKERDISCOVERYAssayDesigningGenotypingMarkerdiscoverygenotypingCLASSICAL APPROACH GENOTYPING BY SEQUENCING
  12. 12. Marker discovery and genotyping are completed at the same time. Facilitates exploration of new germplasm sets. Raw data is dynamic. The raw sequences obtained from GBS can be re-analyzed. Reduced sample handling Few PCR & purification steps No DNA size fractionation Efficient barcoding systemFeatures of Genotyping By Sequencing(Poland et al.,2012)
  13. 13. Computational Biology Service Unit, Cornell University
  14. 14. GBS BARCODES Barcode sets are enzyme specific Must not recreate the enzyme recognition site Must different enough from each other At least 3 bp differences among barcodes. No mononucleotide runs of 3 or more bases
  15. 15. Steps involved in Genotyping by SequencingLibrary Construction for Next Gen Sequencing(Elshire et al.,2011)
  16. 16. Filtering and Selection of ReadsComputational Biology Service Unit, Cornell University
  17. 17. Sequence ProcessingComputational Biology Service Unit, Cornell University
  18. 18. Computational Biology Service Unit, Cornell University
  19. 19. Computational Biology Service Unit, Cornell University
  20. 20. Computational Biology Service Unit, Cornell University
  21. 21. Computational Biology Service Unit, Cornell University
  22. 22. Computational Biology Service Unit, Cornell University
  23. 23. Computational Biology Service Unit, Cornell University
  24. 24. Computational Biology Service Unit, Cornell University
  25. 25. Computational Biology Service Unit, Cornell University
  26. 26. Software Available
  27. 27. BioinformaticsChallengesMassive amountsof dataComplex genomesMissing data
  28. 28. Microarray• Arrays designed based on one set of populations might notrepresent the SNPs in a new germplasm set, higher cost of scale,SNP array development is very time-consuming and costly.Genotyping By Sequencing• Free of the bias, GBS costs less and there are no upfronteffortsSNPDISCOVERY
  29. 29. Genotyping by sequencingMarker Discovery Bulk Segregant AnalysisFinemappingQTLsGenomicSelectionGWASPOTENTIAL APPLICATIONS OF GBS DATA
  30. 30. Genotyping BySequencingReducedRepresentationWheat,Barley.....Whole GenomeResequencing.Done in Riceand ArabidopsisTwo Different Approaches of Genotyping By Sequencing
  31. 31. MARKER DISCOVERYCornell CBSU Workshop
  32. 32. (Huang et al.,2009)Genetic map for 150 rice recombinant inbred has been constructed by usingIllumina genome analyser, resulted in discovery of 1,226,791 SNPS
  33. 33. The population was developed from a cross between two ricecultivars with genome sequences, Oryza sativa ssp. japonica cv.Nipponbare and Oryza sativa spp. indica cv. 93-11 .With a relativelyhigh mapping resolution, candidate genes for some QTL of large ormoderate effect were identified.(Wang et al.,2011)
  34. 34.  Cloning QTL is technically challenging. It requires the development of near-isogenic lines (NILs) through repeatedly backcrossing with one of the mappingparents or additional samples of natural variants for association of phenotypeand candidate genes. Positional cloning using NILs is time-consuming andlabor-intensive because it takes a few generations of backcrossing to makeNILs and thousands of recombinants to fine map the candidate genes Genotyping by sequencing approach can substantially reduce the amount oftime and effort required for QTL mapping
  35. 35. 49 QTL within relatively small genomic regions for 14 agronomic traits were identified(Wang et al.,2011)
  36. 36. Traits measured directly in the field include heading date, culm diameter, plant height,flag leaf length and flag leaf width, tiller angle, tiller number, panicle length, and awnlength. Traits measured in the laboratory following harvest include grain length, grainwidth, grain thickness, grain weight, and spikelet number per panicle(Wang et al.,2011)
  37. 37. Five QTL of relatively large effect (14.6–46.0%) were located on small genomic regions,where strong candidate genes were found. The analysis using sequencing- based genotyping thus offers a powerful solution tomap QTL with high resolution(Wang et al.,2011)
  38. 38.  RAD was initially proposed by Miller (2007) and adapted to incorporatebarcoding for multiplexing with Illumina sequencing technology by Baird et al,(2008) . The RAD procedure has been used successfully to identify SNPs in anumber of plant species including eggplant, barley, and globe artichoke. Subsequently, Elshire et al. (2011) proposed a method for the construction ofhighly multiplexed reduced complexity genotyping by sequencing (GBS) libraries.The procedure is based on a similar restriction digestion technique to RAD, but itis substantially less complicated, resulting in time and cost savings in librarypreparation, but the resultant data contains a larger number of missing genotypecalls.Developments in Genotyping by Sequencing
  39. 39. (Sonah et al.,2013)An Improved Genotyping by Sequencing (GBS) Approach Offering IncreasedVersatility and Efficiency of SNP Discovery and GenotypingA uniform distribution of the ApeK1 restriction sites was observed following in silico digestionof the soybean genome and a good proportion of the resultant fragments were short enoughfor effective amplification and sequencing on the Illumina platformSelection of an Appropriate Enzyme for GBS in Soybean
  40. 40. (Sonah et al.,2013)
  41. 41. Summary of sequenced raw and processed reads in eight soybeangenotypes obtained on an Illumina Genome Analyzer II.The number of sorted raw sequence reads ranged from 0.44 million reads (TGx1989-53F)up to 1.00 million reads (Ocepara-4). A total of 5.50 million processed quality reads (98.76%of all reads) were retained. Processed reads of the individual genotypes were mapped ontothe reference genome and only reads mapping to a unique location in the genome wereretained. Such uniquely mapped reads represented 85% of the total and were welldistributed across the chromosomes(Sonah et al.,2013)
  42. 42. (a) Distribution of mapped sequence reads and SNPs identified using a GBS approach, the frequency of SNPs on thetwenty soybean chromosomes averaged 10 SNPs/Mb(b) (b) Frequency of genes and transposons identified in the same bins on soybean chromosome. The distribution ofSNPs closely mirrors the distribution of genic sequences; it proved to be highest in gene-rich terminal regions andlowest in highly repetitive centromeric and pericentromeric regions of chromosomes.Sequence coverage and SNP distribution(Sonah et al.,2013)
  43. 43. Optimizing the Number and Coverage of SNPs by the Use of Selective PrimersLibrary construction with a common primer having 1 (A or C) or 2 (AA, AC or CC) selectivebases at the 3′ end, a significant improvement in both the number and the depth ofcoverage of called SNPs. Most libraries prepared using selective amplification resulted in agreater number of SNP calls with an improved depth of coverage(Sonah et al.,2013)
  44. 44.  A set of eight diverse soybean genotypes were used. Using ApeKI for GBSlibrary preparation and sequencing on an Illumina GAIIx machine, 5.5 Mreads were obtained and were processed. A total of 10,120 high quality SNPs were obtained and the distribution ofthese SNPs mirrored closely the distribution of gene-rich regions in thesoybean genome. A total of 39.5% of the SNPs were present in genic regionsand 52.5% of these were located in the coding sequence. The use of selective primers to achieve a greater complexity reduction duringGBS library preparation has been proved. The number of SNP calls could beincreased by almost 40% and their depth of coverage can be more thandoubled.SUMMARY
  45. 45.  Predicts desirable phenotypes by calculating breeding values based ongenotype. Statistical power is dependent on using large numbers of genetic markers,so limited by the cost and availability of dense genome-wide marker data GBS can be used to generate markers to characterize breeding lines anddevelop accurate GS models. Even modest gains from genomic selection could save years of in-fieldevaluation.Genomic Selection
  46. 46. Genotyping-by-sequencing (GBS) can be used for de novo genotyping of breedingpanels and to develop accurate GS models, even for the large, complex, andpolyploid wheat (Triticum aestivum L.) genome. Researchers applied GBS to a setof 254 elite breeding lines from the CIMMYT and developed GS models for yield,days to heading (DTH), and thousand-kernel weight (TKW).(Poland et al.,2012)
  47. 47. GBS markers led to higher genomic prediction accuracies .For both yield traits andheading date, the accuracy gain was in the range of 0.13 of 0.24. For TKW the increasewas smaller (0.05) and not significant (p-value > 0.05). With a comparable number ofmarkers, the GBS platform led to significantly higher accuracy (gains of approximately0.15) for drought yield and heading date when compared to the DArT markers.
  48. 48. (Poland et al.,2012),
  49. 49. Researchers identified 41,371 single nucleotide polymorphisms(SNPs). Genomic-estimated breeding value prediction accuracies withGBS were 0.28 to 0.45 for grain yield, an improvement of 0.1 to 0.2over an established marker platform for wheat.(Poland et al.,2012),SUMMARY
  50. 50.
  51. 51. For barley the original GBS protocol has been extended to atwo-restriction-enzyme system .[1] Elshireet al. (2011) A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species.PLoSONE 6(5): e19379. doi:10.1371/journal.pone.0019379[2] Poland et al., (2012) Development of High-DensityGenetic Maps for Barley and Wheat Using a Novel Two-Enzyme Genotyping-by-Sequencing Approach. PLoSONE7(2): e32253. doi:10.1371/journal.pone.0032253
  52. 52.
  53. 53.
  54. 54.
  55. 55.
  56. 56. (Donato et al.,2013)47 animals representing 7 taurine and indicine breeds of cattle from the US and Africa. 51,414SNPs were detected throughout all autosomes with an average distance of 48.1 kb, and 1,143SNPs on the X chromosome at an average distance of 130.3 kb, as well as 191 on unmappedcontigs.
  57. 57. Integration of genotyping-by-sequencing (GBS) in the context ofplant breeding and genomics( Poland and Rife., 2012)
  58. 58. ConclusionGenotyping by-sequencing (GBS) is a rapid and robust approachfor reduced-representation sequencing that combines genome-wide molecular marker discovery and genotyping.The flexibility and low cost of GBS makes this an excellent toolfor many applications and research questions in plant genetics andbreedingGBS will become more powerful with the continued increase ofsequencing output, development of reference genomes, andimprovement of bioinformatics.