Presenter : Lovejot
 Methods for high-throughput marker discovery.
 Genotyping by sequencing strategy.
 Bioinformatics Pipeline.
 Need for modifying Genotyping-by-Sequencing.
 Applications of Genotyping-by-Sequencing.
Brief Outline
http://www.maizegenetics.net/
TraditionalMarkerDiscovery
Costly and can not be
parallelized
Time Consuming Cloning
and Primer Designing
Scoring :Expensive and
Labourious
NGSBasedMarker
Discovery
Discovering, Sequencing
and Genotyping of large
number of markers
FAST
Parallelized Library
Preparation
Sequencing is rapidly becoming so inexpensive that it will soon be reasonable to use it for
every genetic study
(Poland et al.,2012)
Enrichment strategies
• Long range PCR amplification (using molecular inversion probes)
• DNA hybridization/sequence capture methods
Time-consuming, technologically challenging, and can be cost-prohibitive for assaying
large numbers of samples.
Complexity Reduction Using Restriction Enzymes
 Easy,
 Quick
 Extremely specific
 Highly reproducible reach important regions of the genome (inaccessible to sequence
capture approaches)
 Repetitive regions of genomes can be avoided and lower copy regions can be targeted
with two to three fold higher efficiency
 Simplifies computationally challenging alignment problems in species with high levels
of genetic diversity.
http://www.maizegenetics.net/
Reduced-representation sequencing
 Reduced-representation libraries (RRLs)/CRoPS.
 Restriction-site-associated DNA sequencing (RAD-seq).
 Multiplexed Shotgun Sequencing.
 Genotyping based Sequencing.
 Model organisms with high-quality reference genome sequences
 Non-Model species with no existing genomic data
Methods for high-throughput marker
discovery using NGS
(Davey et al.,2011)
Comparison of current genotyping methods using
next-generation sequencing
(Poland et al.,2012)
( Davey et al.,2011)
GBS provides many advantages. It
offers a much simplified library
preparation procedure that can
be performed with small amounts
of starting DNA (100–200 ng) and
is amenable to a high level of
multiplexing.
MARKER
DISCOVERY
Assay
Designing
Genotyping
Marker
discovery
genotyping
CLASSICAL APPROACH GENOTYPING BY SEQUENCING
Marker discovery and genotyping are completed at the same time.
 Facilitates exploration of new germplasm sets.
 Raw data is dynamic.
 The raw sequences obtained from GBS can be re-analyzed.
 Reduced sample handling
 Few PCR & purification steps
 No DNA size fractionation
 Efficient barcoding system
Features of Genotyping By Sequencing
(Poland et al.,2012)
Computational Biology Service Unit, Cornell University
GBS BARCODES
 Barcode sets are enzyme specific
 Must not recreate the enzyme recognition site
 Must different enough from each other
 At least 3 bp differences among barcodes.
 No mononucleotide runs of 3 or more bases
http://www.maizegenetics.net/
Steps involved in Genotyping by Sequencing
Library Construction for Next Gen Sequencing
(Elshire et al.,2011)
Filtering and Selection of Reads
Computational Biology Service Unit, Cornell University
Sequence Processing
Computational Biology Service Unit, Cornell University
Computational Biology Service Unit, Cornell University
Computational Biology Service Unit, Cornell University
Computational Biology Service Unit, Cornell University
Computational Biology Service Unit, Cornell University
Computational Biology Service Unit, Cornell University
Computational Biology Service Unit, Cornell University
Computational Biology Service Unit, Cornell University
Computational Biology Service Unit, Cornell University
Software Available
Bioinformatics
Challenges
Massive amounts
of data
Complex genomes
Missing data
Microarray
• Arrays designed based on one set of populations might not
represent the SNPs in a new germplasm set, higher cost of scale,
SNP array development is very time-consuming and costly.
Genotyping By Sequencing
• Free of the bias, GBS costs less and there are no upfront
efforts
SNP
DISCOVERY
Genotyping by sequencing
Marker Discovery Bulk Segregant Analysis
Fine
mapping
QTLs
Genomic
Selection
GWAS
POTENTIAL APPLICATIONS OF GBS DATA
Genotyping By
Sequencing
Reduced
Representation
Wheat,
Barley.....
Whole Genome
Resequencing.
Done in Rice
and Arabidopsis
Two Different Approaches of Genotyping By Sequencing
MARKER DISCOVERY
Cornell CBSU Workshop
(Huang et al.,2009)
Genetic map for 150 rice recombinant inbred has been constructed by using
Illumina genome analyser, resulted in discovery of 1,226,791 SNPS
The population was developed from a cross between two rice
cultivars with genome sequences, Oryza sativa ssp. japonica cv.
Nipponbare and Oryza sativa spp. indica cv. 93-11 .With a relatively
high mapping resolution, candidate genes for some QTL of large or
moderate effect were identified.
(Wang et al.,2011)
 Cloning QTL is technically challenging. It requires the development of near-
isogenic lines (NILs) through repeatedly backcrossing with one of the mapping
parents or additional samples of natural variants for association of phenotype
and candidate genes. Positional cloning using NILs is time-consuming and
labor-intensive because it takes a few generations of backcrossing to make
NILs and thousands of recombinants to fine map the candidate genes
 Genotyping by sequencing approach can substantially reduce the amount of
time and effort required for QTL mapping
49 QTL within relatively small genomic regions for 14 agronomic traits were identified
(Wang et al.,2011)
Traits measured directly in the field include heading date, culm diameter, plant height,
flag leaf length and flag leaf width, tiller angle, tiller number, panicle length, and awn
length. Traits measured in the laboratory following harvest include grain length, grain
width, grain thickness, grain weight, and spikelet number per panicle
(Wang et al.,2011)
Five QTL of relatively large effect (14.6–46.0%) were located on small genomic regions,
where strong candidate genes were found.
 The analysis using sequencing- based genotyping thus offers a powerful solution to
map QTL with high resolution
(Wang et al.,2011)
 RAD was initially proposed by Miller (2007) and adapted to incorporate
barcoding for multiplexing with Illumina sequencing technology by Baird et al,
(2008) . The RAD procedure has been used successfully to identify SNPs in a
number of plant species including eggplant, barley, and globe artichoke.
 Subsequently, Elshire et al. (2011) proposed a method for the construction of
highly multiplexed reduced complexity genotyping by sequencing (GBS) libraries.
The procedure is based on a similar restriction digestion technique to RAD, but it
is substantially less complicated, resulting in time and cost savings in library
preparation, but the resultant data contains a larger number of missing genotype
calls.
Developments in Genotyping by Sequencing
(Sonah et al.,2013)
An Improved Genotyping by Sequencing (GBS) Approach Offering Increased
Versatility and Efficiency of SNP Discovery and Genotyping
A uniform distribution of the ApeK1 restriction sites was observed following in silico digestion
of the soybean genome and a good proportion of the resultant fragments were short enough
for effective amplification and sequencing on the Illumina platform
Selection of an Appropriate Enzyme for GBS in Soybean
(Sonah et al.,2013)
Summary of sequenced raw and processed reads in eight soybean
genotypes obtained on an Illumina Genome Analyzer II.
The number of sorted raw sequence reads ranged from 0.44 million reads (TGx1989-53F)
up to 1.00 million reads (Ocepara-4). A total of 5.50 million processed quality reads (98.76%
of all reads) were retained. Processed reads of the individual genotypes were mapped onto
the reference genome and only reads mapping to a unique location in the genome were
retained. Such uniquely mapped reads represented 85% of the total and were well
distributed across the chromosomes
(Sonah et al.,2013)
(a) Distribution of mapped sequence reads and SNPs identified using a GBS approach, the frequency of SNPs on the
twenty soybean chromosomes averaged 10 SNPs/Mb
(b) (b) Frequency of genes and transposons identified in the same bins on soybean chromosome. The distribution of
SNPs closely mirrors the distribution of genic sequences; it proved to be highest in gene-rich terminal regions and
lowest in highly repetitive centromeric and pericentromeric regions of chromosomes.
Sequence coverage and SNP distribution
(Sonah et al.,2013)
Optimizing the Number and Coverage of SNPs by the Use of Selective Primers
Library construction with a common primer having 1 (A or C) or 2 (AA, AC or CC) selective
bases at the 3′ end, a significant improvement in both the number and the depth of
coverage of called SNPs. Most libraries prepared using selective amplification resulted in a
greater number of SNP calls with an improved depth of coverage
(Sonah et al.,2013)
 A set of eight diverse soybean genotypes were used. Using ApeKI for GBS
library preparation and sequencing on an Illumina GAIIx machine, 5.5 M
reads were obtained and were processed.
 A total of 10,120 high quality SNPs were obtained and the distribution of
these SNPs mirrored closely the distribution of gene-rich regions in the
soybean genome. A total of 39.5% of the SNPs were present in genic regions
and 52.5% of these were located in the coding sequence.
 The use of selective primers to achieve a greater complexity reduction during
GBS library preparation has been proved. The number of SNP calls could be
increased by almost 40% and their depth of coverage can be more than
doubled.
SUMMARY
 Predicts desirable phenotypes by calculating breeding values based on
genotype.
 Statistical power is dependent on using large numbers of genetic markers,
so limited by the cost and availability of dense genome-wide marker data
 GBS can be used to generate markers to characterize breeding lines and
develop accurate GS models.
 Even modest gains from genomic selection could save years of in-field
evaluation.
Genomic Selection
Genotyping-by-sequencing (GBS) can be used for de novo genotyping of breeding
panels and to develop accurate GS models, even for the large, complex, and
polyploid wheat (Triticum aestivum L.) genome. Researchers applied GBS to a set
of 254 elite breeding lines from the CIMMYT and developed GS models for yield,
days to heading (DTH), and thousand-kernel weight (TKW).
(Poland et al.,2012)
GBS markers led to higher genomic prediction accuracies .For both yield traits and
heading date, the accuracy gain was in the range of 0.13 of 0.24. For TKW the increase
was smaller (0.05) and not significant (p-value > 0.05). With a comparable number of
markers, the GBS platform led to significantly higher accuracy (gains of approximately
0.15) for drought yield and heading date when compared to the DArT markers.
(Poland et al.,2012),
Researchers identified 41,371 single nucleotide polymorphisms
(SNPs). Genomic-estimated breeding value prediction accuracies with
GBS were 0.28 to 0.45 for grain yield, an improvement of 0.1 to 0.2
over an established marker platform for wheat.
(Poland et al.,2012),
SUMMARY
www.lifetechnologies.com/gbs
For barley the original GBS protocol has been extended to a
two-restriction-enzyme system .
[1] Elshireet al. (2011) A Robust, Simple Genotyping-by-
Sequencing (GBS) Approach for High Diversity Species.
PLoSONE 6(5): e19379. doi:10.1371/journal.pone.0019379
[2] Poland et al., (2012) Development of High-Density
Genetic Maps for Barley and Wheat Using a Novel Two-
Enzyme Genotyping-by-Sequencing Approach. PLoSONE
7(2): e32253. doi:10.1371/journal.pone.0032253
www.lifetechnologies.com/gbs
www.lifetechnologies.com/gbs
www.lifetechnologies.com/gbs
http://www.maizegenetics.net/
(Donato et al.,2013)
47 animals representing 7 taurine and indicine breeds of cattle from the US and Africa. 51,414
SNPs were detected throughout all autosomes with an average distance of 48.1 kb, and 1,143
SNPs on the X chromosome at an average distance of 130.3 kb, as well as 191 on unmapped
contigs.
Integration of genotyping-by-sequencing (GBS) in the context of
plant breeding and genomics
( Poland and Rife., 2012)
Conclusion
Genotyping by-sequencing (GBS) is a rapid and robust approach
for reduced-representation sequencing that combines genome-
wide molecular marker discovery and genotyping.
The flexibility and low cost of GBS makes this an excellent tool
for many applications and research questions in plant genetics and
breeding
GBS will become more powerful with the continued increase of
sequencing output, development of reference genomes, and
improvement of bioinformatics.

Genotyping by Sequencing

  • 1.
  • 2.
     Methods forhigh-throughput marker discovery.  Genotyping by sequencing strategy.  Bioinformatics Pipeline.  Need for modifying Genotyping-by-Sequencing.  Applications of Genotyping-by-Sequencing. Brief Outline
  • 3.
  • 4.
    TraditionalMarkerDiscovery Costly and cannot be parallelized Time Consuming Cloning and Primer Designing Scoring :Expensive and Labourious
  • 5.
    NGSBasedMarker Discovery Discovering, Sequencing and Genotypingof large number of markers FAST Parallelized Library Preparation
  • 6.
    Sequencing is rapidlybecoming so inexpensive that it will soon be reasonable to use it for every genetic study (Poland et al.,2012)
  • 7.
    Enrichment strategies • Longrange PCR amplification (using molecular inversion probes) • DNA hybridization/sequence capture methods Time-consuming, technologically challenging, and can be cost-prohibitive for assaying large numbers of samples. Complexity Reduction Using Restriction Enzymes  Easy,  Quick  Extremely specific  Highly reproducible reach important regions of the genome (inaccessible to sequence capture approaches)  Repetitive regions of genomes can be avoided and lower copy regions can be targeted with two to three fold higher efficiency  Simplifies computationally challenging alignment problems in species with high levels of genetic diversity. http://www.maizegenetics.net/
  • 8.
    Reduced-representation sequencing  Reduced-representationlibraries (RRLs)/CRoPS.  Restriction-site-associated DNA sequencing (RAD-seq).  Multiplexed Shotgun Sequencing.  Genotyping based Sequencing.  Model organisms with high-quality reference genome sequences  Non-Model species with no existing genomic data Methods for high-throughput marker discovery using NGS (Davey et al.,2011)
  • 9.
    Comparison of currentgenotyping methods using next-generation sequencing (Poland et al.,2012)
  • 10.
    ( Davey etal.,2011) GBS provides many advantages. It offers a much simplified library preparation procedure that can be performed with small amounts of starting DNA (100–200 ng) and is amenable to a high level of multiplexing.
  • 11.
  • 12.
    Marker discovery andgenotyping are completed at the same time.  Facilitates exploration of new germplasm sets.  Raw data is dynamic.  The raw sequences obtained from GBS can be re-analyzed.  Reduced sample handling  Few PCR & purification steps  No DNA size fractionation  Efficient barcoding system Features of Genotyping By Sequencing (Poland et al.,2012)
  • 13.
    Computational Biology ServiceUnit, Cornell University
  • 14.
    GBS BARCODES  Barcodesets are enzyme specific  Must not recreate the enzyme recognition site  Must different enough from each other  At least 3 bp differences among barcodes.  No mononucleotide runs of 3 or more bases http://www.maizegenetics.net/
  • 15.
    Steps involved inGenotyping by Sequencing Library Construction for Next Gen Sequencing (Elshire et al.,2011)
  • 16.
    Filtering and Selectionof Reads Computational Biology Service Unit, Cornell University
  • 17.
    Sequence Processing Computational BiologyService Unit, Cornell University
  • 18.
    Computational Biology ServiceUnit, Cornell University
  • 19.
    Computational Biology ServiceUnit, Cornell University
  • 20.
    Computational Biology ServiceUnit, Cornell University
  • 21.
    Computational Biology ServiceUnit, Cornell University
  • 22.
    Computational Biology ServiceUnit, Cornell University
  • 23.
    Computational Biology ServiceUnit, Cornell University
  • 24.
    Computational Biology ServiceUnit, Cornell University
  • 25.
    Computational Biology ServiceUnit, Cornell University
  • 26.
  • 27.
  • 28.
    Microarray • Arrays designedbased on one set of populations might not represent the SNPs in a new germplasm set, higher cost of scale, SNP array development is very time-consuming and costly. Genotyping By Sequencing • Free of the bias, GBS costs less and there are no upfront efforts SNP DISCOVERY
  • 32.
    Genotyping by sequencing MarkerDiscovery Bulk Segregant Analysis Fine mapping QTLs Genomic Selection GWAS POTENTIAL APPLICATIONS OF GBS DATA
  • 33.
    Genotyping By Sequencing Reduced Representation Wheat, Barley..... Whole Genome Resequencing. Donein Rice and Arabidopsis Two Different Approaches of Genotyping By Sequencing
  • 34.
  • 35.
    (Huang et al.,2009) Geneticmap for 150 rice recombinant inbred has been constructed by using Illumina genome analyser, resulted in discovery of 1,226,791 SNPS
  • 36.
    The population wasdeveloped from a cross between two rice cultivars with genome sequences, Oryza sativa ssp. japonica cv. Nipponbare and Oryza sativa spp. indica cv. 93-11 .With a relatively high mapping resolution, candidate genes for some QTL of large or moderate effect were identified. (Wang et al.,2011)
  • 37.
     Cloning QTLis technically challenging. It requires the development of near- isogenic lines (NILs) through repeatedly backcrossing with one of the mapping parents or additional samples of natural variants for association of phenotype and candidate genes. Positional cloning using NILs is time-consuming and labor-intensive because it takes a few generations of backcrossing to make NILs and thousands of recombinants to fine map the candidate genes  Genotyping by sequencing approach can substantially reduce the amount of time and effort required for QTL mapping
  • 38.
    49 QTL withinrelatively small genomic regions for 14 agronomic traits were identified (Wang et al.,2011)
  • 39.
    Traits measured directlyin the field include heading date, culm diameter, plant height, flag leaf length and flag leaf width, tiller angle, tiller number, panicle length, and awn length. Traits measured in the laboratory following harvest include grain length, grain width, grain thickness, grain weight, and spikelet number per panicle (Wang et al.,2011)
  • 40.
    Five QTL ofrelatively large effect (14.6–46.0%) were located on small genomic regions, where strong candidate genes were found.  The analysis using sequencing- based genotyping thus offers a powerful solution to map QTL with high resolution (Wang et al.,2011)
  • 41.
     RAD wasinitially proposed by Miller (2007) and adapted to incorporate barcoding for multiplexing with Illumina sequencing technology by Baird et al, (2008) . The RAD procedure has been used successfully to identify SNPs in a number of plant species including eggplant, barley, and globe artichoke.  Subsequently, Elshire et al. (2011) proposed a method for the construction of highly multiplexed reduced complexity genotyping by sequencing (GBS) libraries. The procedure is based on a similar restriction digestion technique to RAD, but it is substantially less complicated, resulting in time and cost savings in library preparation, but the resultant data contains a larger number of missing genotype calls. Developments in Genotyping by Sequencing
  • 42.
    (Sonah et al.,2013) AnImproved Genotyping by Sequencing (GBS) Approach Offering Increased Versatility and Efficiency of SNP Discovery and Genotyping A uniform distribution of the ApeK1 restriction sites was observed following in silico digestion of the soybean genome and a good proportion of the resultant fragments were short enough for effective amplification and sequencing on the Illumina platform Selection of an Appropriate Enzyme for GBS in Soybean
  • 43.
  • 44.
    Summary of sequencedraw and processed reads in eight soybean genotypes obtained on an Illumina Genome Analyzer II. The number of sorted raw sequence reads ranged from 0.44 million reads (TGx1989-53F) up to 1.00 million reads (Ocepara-4). A total of 5.50 million processed quality reads (98.76% of all reads) were retained. Processed reads of the individual genotypes were mapped onto the reference genome and only reads mapping to a unique location in the genome were retained. Such uniquely mapped reads represented 85% of the total and were well distributed across the chromosomes (Sonah et al.,2013)
  • 45.
    (a) Distribution ofmapped sequence reads and SNPs identified using a GBS approach, the frequency of SNPs on the twenty soybean chromosomes averaged 10 SNPs/Mb (b) (b) Frequency of genes and transposons identified in the same bins on soybean chromosome. The distribution of SNPs closely mirrors the distribution of genic sequences; it proved to be highest in gene-rich terminal regions and lowest in highly repetitive centromeric and pericentromeric regions of chromosomes. Sequence coverage and SNP distribution (Sonah et al.,2013)
  • 46.
    Optimizing the Numberand Coverage of SNPs by the Use of Selective Primers Library construction with a common primer having 1 (A or C) or 2 (AA, AC or CC) selective bases at the 3′ end, a significant improvement in both the number and the depth of coverage of called SNPs. Most libraries prepared using selective amplification resulted in a greater number of SNP calls with an improved depth of coverage (Sonah et al.,2013)
  • 47.
     A setof eight diverse soybean genotypes were used. Using ApeKI for GBS library preparation and sequencing on an Illumina GAIIx machine, 5.5 M reads were obtained and were processed.  A total of 10,120 high quality SNPs were obtained and the distribution of these SNPs mirrored closely the distribution of gene-rich regions in the soybean genome. A total of 39.5% of the SNPs were present in genic regions and 52.5% of these were located in the coding sequence.  The use of selective primers to achieve a greater complexity reduction during GBS library preparation has been proved. The number of SNP calls could be increased by almost 40% and their depth of coverage can be more than doubled. SUMMARY
  • 48.
     Predicts desirablephenotypes by calculating breeding values based on genotype.  Statistical power is dependent on using large numbers of genetic markers, so limited by the cost and availability of dense genome-wide marker data  GBS can be used to generate markers to characterize breeding lines and develop accurate GS models.  Even modest gains from genomic selection could save years of in-field evaluation. Genomic Selection
  • 49.
    Genotyping-by-sequencing (GBS) canbe used for de novo genotyping of breeding panels and to develop accurate GS models, even for the large, complex, and polyploid wheat (Triticum aestivum L.) genome. Researchers applied GBS to a set of 254 elite breeding lines from the CIMMYT and developed GS models for yield, days to heading (DTH), and thousand-kernel weight (TKW). (Poland et al.,2012)
  • 50.
    GBS markers ledto higher genomic prediction accuracies .For both yield traits and heading date, the accuracy gain was in the range of 0.13 of 0.24. For TKW the increase was smaller (0.05) and not significant (p-value > 0.05). With a comparable number of markers, the GBS platform led to significantly higher accuracy (gains of approximately 0.15) for drought yield and heading date when compared to the DArT markers.
  • 51.
  • 52.
    Researchers identified 41,371single nucleotide polymorphisms (SNPs). Genomic-estimated breeding value prediction accuracies with GBS were 0.28 to 0.45 for grain yield, an improvement of 0.1 to 0.2 over an established marker platform for wheat. (Poland et al.,2012), SUMMARY
  • 53.
  • 54.
    For barley theoriginal GBS protocol has been extended to a two-restriction-enzyme system . [1] Elshireet al. (2011) A Robust, Simple Genotyping-by- Sequencing (GBS) Approach for High Diversity Species. PLoSONE 6(5): e19379. doi:10.1371/journal.pone.0019379 [2] Poland et al., (2012) Development of High-Density Genetic Maps for Barley and Wheat Using a Novel Two- Enzyme Genotyping-by-Sequencing Approach. PLoSONE 7(2): e32253. doi:10.1371/journal.pone.0032253
  • 55.
  • 57.
  • 58.
  • 59.
  • 60.
    (Donato et al.,2013) 47animals representing 7 taurine and indicine breeds of cattle from the US and Africa. 51,414 SNPs were detected throughout all autosomes with an average distance of 48.1 kb, and 1,143 SNPs on the X chromosome at an average distance of 130.3 kb, as well as 191 on unmapped contigs.
  • 61.
    Integration of genotyping-by-sequencing(GBS) in the context of plant breeding and genomics ( Poland and Rife., 2012)
  • 62.
    Conclusion Genotyping by-sequencing (GBS)is a rapid and robust approach for reduced-representation sequencing that combines genome- wide molecular marker discovery and genotyping. The flexibility and low cost of GBS makes this an excellent tool for many applications and research questions in plant genetics and breeding GBS will become more powerful with the continued increase of sequencing output, development of reference genomes, and improvement of bioinformatics.