Genomic diversity of domestication in soybean
Institute of Crop Science
Chinese Academy of Agricultural Sciences
Li-juan Q...
1. Background
2. Genetic diversity of G. soja and G. max
3. Pan-genome of G. soja
4. Genomic variation between G. sojas an...
Glycine
Soja
Glycine 26 perenial wild species (mainly in Australia)
Annual wild soybean (G.soja) (East Asia)
Cultivated so...
G. soja
G. max
Landrace
G. max
Modern Cultivars
Domestication
Improvement
Glycine soja - the wild relative of cultivated
s...
G. soja
G. max vs G. soja
 Plant
— Plant height
— Growth habit
 Seed
— Size
— Color
— Pod dehiscence
 Physiological
tra...
Genetic variation controlled the difference
 The variation of soybean genome during domestication
Genetic variation, e.g....
 The history of soybean cultivation are more than
4500 years since agricultural ancestor Houji, who
planted five crops in...
China owns the most of soybean germplasms
 More than 170,000 soybean accessions are in germplasm
collections. Among them,...
Constructing different level of core collections
Qiu et al 2003,Scientia Agricultura Sinics; Qiu et al 2009, PMB 2013
Core...
 The primary division of genetic diversity was between the
wild and domesticated accessions.
 G. soja and G. max represe...
Genetic diversity was remarkable decreased
after domestication
Li et al. (2010) New Phytologist
Cultivated
Wild
Hyten et a...
From Schmutz et al., Nature 2010; 463:178-183
The development of
sequencing technique
Cultivated soybean reference genome
...
 Pan-genome: The set of all genes present in the
genomes of a group of organisms
3. Pan-genome of G. soja
From: Morgante ...
Why pan-genome ?
Li et al. New Phytologist, 2010
 The largest component of variation (~75%) was among
individuals within ...
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55...
ID GsojaA GsojaB GsojaC GsojaD GsojaE GsojaF GsojaG
Predicated
genome size (Mbp)
981.0 1000.8 1053.78 1118.34 956.43 992.6...
The pan-genome is dynamic and a single genome does not
adequately represent the diversity of the species
 The number of t...
 Pan-genome
Core: 48.6% of genes and 80.1% of genome sequence
Dispensable: 51.4% of genes and 19.9% of genome sequence
59...
8.86/kb
19.93/kb
 The dispensable gene set was more variable than the core
gene set, both structurally and functionally.
...
 58.3% of the dispensable could not be assigned any
functional annotation versus 33.9% for the core genes set.
 95.5% of...
Evolution of the G. max /G. soja species complex
 G. soja diverged from G. max more than 0.8 mya
 Nearly 3 times older t...
4.Genomic variation between G. soja and GmaxW82
 SNPs: 3.63~4.72 million
 Indels: 0.50~0.77 million
 Structure var: CNV...
G.soja vs G.max: Genomic basis of agronomic traits
photosensing and light signaling coordinately
controlsling
flowering
Tw...
G. max
G. soja
Re-sequencing*
1 G.soja+1 G.max
Re-sequencing #
25 G.soja+30 G.max
De novo sequencing
7 G.soja+1 G.max
?712...
 9 SNPs in a 62bp fragment
More SNPs were found by assembly-based method
 10 million SNPs, two time of SNPs identified b...
 Copy number variation: 1978 genes
1179 loss
726 gain
73 gain and loss
Category: G. soja > G.max
Number: G. max > G. soja...
>100 bp and <95% identity
PAV sequence: 30.3 Mb
G. soja specific: 11.3 Mb
G. max specific: 19 Mbp
PAV gene:354
G. soja s...
 PAV: 24.3% of involved
in defense response
 Gs1-3: biotic and abiotic
stress tolerance or plant
development
 56 resequ...
Wild
Culitvated
1
2
3
4
5
1
2
3
4
5
 Population bottleneck or artificial selection will result in the fixation
of alleles...
G. Soja
Landrace
Elite cultivar
25 accessions 93.55Gb 98.2%Glyma1.01
 31 accessions (Lam et al. 2010)
17 G. soja
14 G. ma...
0
10
20
30
40
50
60
G
m
01G
m
02G
m
03G
m
04G
m
05G
m
06G
m
07G
m
08G
m
09G
m
10G
m
11G
m
12G
m
13G
m
14G
m
15G
m
16G
m
17...
 The distribution of selection regions were not random or even
uniform throughout the genome
 Appeared to be apparent cl...
A homolog of the domestication gene Grain Incomplete Filling 1
(GIF1) in rice
 GIF1 encodes cell-wall invertase that regu...
 GmTfl1 (Glyma19g37890.1): Tian et al. 2010; Liu et al. 2010
gDNA cDNA
θ π θ π
GmTfl1 Glyma19g37890.1
Elite cultivars 1.8...
Confirmed some regions or genes
• 100-seed weight: QTL by Yan et al 2014 Plant Breeding, 2014
Type
No. of
SNPs
No. of
hapl...
Black
Diverse
color
Yellow
G. soja
Landrace
Elite
cultivar
CHS1, CHS3, CHS4, CHS5, and CHS9
Multiple-allele I locus
Soybea...
 The hierarchical genetic structure of soybean landraces was
reflected with the geographic region.
 A pan-genome was con...
Funding:
National Natural Science Foundation of China State Key Basic Research
and Development Plan of China (973)
Nationa...
Upcoming SlideShare
Loading in …5
×

THEME – 4 Genomic diversity of domestication in soybean

1,203 views

Published on

Published in: Science, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,203
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
41
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

THEME – 4 Genomic diversity of domestication in soybean

  1. 1. Genomic diversity of domestication in soybean Institute of Crop Science Chinese Academy of Agricultural Sciences Li-juan Qiu International Workshop on “Applied Mathematics and Omics Technologies for Discovering Biodiversity and Genetic Resources for Climate Change Mitigation and Adaptation to Sustain Agriculture in Drylands” IAV Hassan II – Rabat - Morocco, 24-27 June 2014
  2. 2. 1. Background 2. Genetic diversity of G. soja and G. max 3. Pan-genome of G. soja 4. Genomic variation between G. sojas and GmaxW82 5. Selection genes during domestication Outline
  3. 3. Glycine Soja Glycine 26 perenial wild species (mainly in Australia) Annual wild soybean (G.soja) (East Asia) Cultivated soybean (G. max) (Worldwide) Leguminosea, Papilollateae, Glycine 1. Background
  4. 4. G. soja G. max Landrace G. max Modern Cultivars Domestication Improvement Glycine soja - the wild relative of cultivated soybean G. max S econdary G ene Pool GP-2 unknown Tertiary Gene Pool GP-3 Wild perennial species Tertiary Gene Pool GP-3 Wild perennial species From: Harlan and deWet (1971) Two bottlenecks: domestication and breeding
  5. 5. G. soja G. max vs G. soja  Plant — Plant height — Growth habit  Seed — Size — Color — Pod dehiscence  Physiological trait — Protein content — Oil content Modern cultivar
  6. 6. Genetic variation controlled the difference  The variation of soybean genome during domestication Genetic variation, e.g. SNP, InDel, PAV, CNV Domestication trait related gene The genetic variation between wild and cultivated soybean ? Domestication related traits The genetic variation between wild and cultivated soybean ? Domestication related traits
  7. 7.  The history of soybean cultivation are more than 4500 years since agricultural ancestor Houji, who planted five crops including soybean.  According to word record, the earlist name of soybean was “shu” in “The Book of Odes”.  The other languages of soybean in the world are was translated from the “shu”. Cultivated soybean is native to China
  8. 8. China owns the most of soybean germplasms  More than 170,000 soybean accessions are in germplasm collections. Among them, 45,000 accessions are unique (Carter et al. 2004)  More than 23,000 cultivated and 7,000 soybean accessions are conserved in Chinese National Gene bank (CNGB).
  9. 9. Constructing different level of core collections Qiu et al 2003,Scientia Agricultura Sinics; Qiu et al 2009, PMB 2013 Core collection: represent the genetic diversity of a crop species and its relatives with a minimum of repetitiveness Primary core collection Basic collection Core collection AAAABBBB CCCCDDDDEEEE FFFGGGHHH AABB CCDDEE FFGGHHH ABCEFGH Primary core collection (2794) Basic collection (23587) Location Phenotype Phenotype Genotype Core collection in the different level (248; 433…) Methods Methods Primary core collection Basic collection Core collection AAAABBBB CCCCDDDDEEEE FFFGGGHHH AABB CCDDEE FFGGHHH ABCEFGH AAAABBBB CCCCDDDDEEEE FFFGGGHHH AABB CCDDEE FFGGHHH ABCEFGH Primary core collection (2794) Basic collection (23587) Location Phenotype Phenotype Genotype Core collection in the different level (248; 433…) Methods Methods Primary core collection (2794) Basic collection (23587) Location Phenotype Phenotype Genotype Core collection in the different level (248; 433…) Methods Methods
  10. 10.  The primary division of genetic diversity was between the wild and domesticated accessions.  G. soja and G. max represent distinct germplasm pools. B A G.max G.soja G.max G.soja G.max G.soja K=2 K=3 K=4 K=5 K=6 K=2 K=3 K=4 K=5 K=6 B A G.max G.soja G.max G.soja G.max G.soja K=2 K=3 K=4 K=5 K=6 K=2 K=3 K=4 K=5 K=6 A G.max G.soja G.max G.soja G.max G.soja K=2 K=3 K=4 K=5 K=6 K=2 K=3 K=4 K=5 K=6 2. Differentiation between G.soja and G. max S HH N NE Russia Korea Japa n 99 SSR 554 SNP SSR+SNP S HH N NE Russia Korea Japa n S HH N NE Russia Korea Japa n 99 SSR 554 SNP SSR+SNP Li et al. New Phytologist, 2010; Li et al. Theor Appl Genet, 2008 1863 landraces; 59 SSR 112 wild soybean; 99 SSR, 554 SNP  Population structure within species is accordance with geographic origin in cultivated and wild soybeans respectively
  11. 11. Genetic diversity was remarkable decreased after domestication Li et al. (2010) New Phytologist Cultivated Wild Hyten et al. (2006) PNAS  Accessions: 26 G. soja 94 G. max  Molecular data: 111 fragments from 102 genes  Accessions: 92 G. soja 279 G. max  Molecular data: 554 SNP markers 99 SSR markers 1807 Wild 0.871 1473 Cultivated 0.68778.3% 81.5%
  12. 12. From Schmutz et al., Nature 2010; 463:178-183 The development of sequencing technique Cultivated soybean reference genome Gmax W82 As an important source of genetic diversity, gene repertoire in G. soja remains largely unexplored
  13. 13.  Pan-genome: The set of all genes present in the genomes of a group of organisms 3. Pan-genome of G. soja From: Morgante et al. Current Opinion in Plant Biology 10, 149-155 (2007)  Core genome: shared among individuals.  Dispensable genome: an individual-specific or partially-shared among individuals.
  14. 14. Why pan-genome ? Li et al. New Phytologist, 2010  The largest component of variation (~75%) was among individuals within population  A single genome sequence might not reflect the entire genomic complement of a species AMOVA
  15. 15. 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 GsojaC GsojaB GsojaA GsojaG GsojaE GsojaD  Seven representative wild soybean (New Phytologist, 2010)  China: Northeast, North, Huanghuai and South regions  Other countries: Japan, Korea, Russia  Three libraries: 180bp, 500bp, 2kbp  Data 817Gb, 111.9 X in average Pan-genome of annual wild soybean
  16. 16. ID GsojaA GsojaB GsojaC GsojaD GsojaE GsojaF GsojaG Predicated genome size (Mbp) 981.0 1000.8 1053.78 1118.34 956.43 992.66 889.33 Assembled geneome size(Mbp) 813 895 841 985 920 886 878 Contig N50 (Kbp)* 9 22.2 8 11 27 24.3 19.2 Scaffold N50 (Kbp) 18.3 57.2 17 48.7 65.1 52.4 44.9 No. of genes predicated 58,756 56,655 60,377 62,048 58,414 57,573 58,169 No. of genes confirmed 55,061 54,256 56,542 57,631 55,901 54,805 54,797  Number of predicated genes: average 55,570 genes/ genome  RNA-Seq validation: 67.3% of predicated genes Summary of data and assembly
  17. 17. The pan-genome is dynamic and a single genome does not adequately represent the diversity of the species  The number of total genes increased as additional genomes were added and the no of core genes decreased  The average pan-genome size of any two accessions accounted for 78.2% of that found using all seven accessions
  18. 18.  Pan-genome Core: 48.6% of genes and 80.1% of genome sequence Dispensable: 51.4% of genes and 19.9% of genome sequence 59,080 genes Genome size: 986.3 Mbp Pan-genome of annual wild soybean
  19. 19. 8.86/kb 19.93/kb  The dispensable gene set was more variable than the core gene set, both structurally and functionally.  The dispensable genes have experienced weaker purifying selection and evolved more quickly than core genes Core genome vs. dispensable genome
  20. 20.  58.3% of the dispensable could not be assigned any functional annotation versus 33.9% for the core genes set.  95.5% of core genes had homologs in other species based on blast searches to 32 plant genomes (excluding soybean), significantly more than the dispensable gene set (83.5%, chi-square test, p< 0.01). lineage-specific genes evolved faster than genes that were shared between species, either via a higher evolutionary rate or a higher gene loss rate Core genes were more functionally conservative among plant species than dispensable genes
  21. 21. Evolution of the G. max /G. soja species complex  G. soja diverged from G. max more than 0.8 mya  Nearly 3 times older than a previous estimate of 0.27 mya based on re-sequencing of a single G. soja genome 670 conserved single-copy gene orthologs
  22. 22. 4.Genomic variation between G. soja and GmaxW82  SNPs: 3.63~4.72 million  Indels: 0.50~0.77 million  Structure var: CNV, PAV Thousands of genes affected by above variations, some of which may be useful for future crop improvement.
  23. 23. G.soja vs G.max: Genomic basis of agronomic traits photosensing and light signaling coordinately controlsling flowering Two 3nt-indel and 9 non-synonymous SNP; two variation hotpots photosensing and light signaling coordinately controlsling flowering Two 3nt-indel and 9 non-synonymous SNP; two variation hotpots
  24. 24. G. max G. soja Re-sequencing* 1 G.soja+1 G.max Re-sequencing # 25 G.soja+30 G.max De novo sequencing 7 G.soja+1 G.max ?712???19.6M?250M 33816117972615M85M480M510M ?????70M?510M ?712???19.6M?250M 33816117972615M85M480M510M ?????70M?510M #: From Li et al. BMC Genomics, 2013; *: FromKim et al. PNAS, 2010 G.soja- specific G.max- specific CNV-lossCNV-gainLarge InDel (5-100bp) Small InDel (1-5bp) SNP missed in Re-seq SNP G.soja- specific G.max- specific CNV-lossCNV-gainLarge InDel (5-100bp) Small InDel (1-5bp) SNP missed in Re-seq SNP Specific variations identified in this comparison
  25. 25.  9 SNPs in a 62bp fragment More SNPs were found by assembly-based method  10 million SNPs, two time of SNPs identified by re-sequencing (Li et al. BMC Genomics, 2013) New SNP mostly from divergent regions where assembled sequences could be aligned and short sequencing reads are difficult to be mapped
  26. 26.  Copy number variation: 1978 genes 1179 loss 726 gain 73 gain and loss Category: G. soja > G.max Number: G. max > G. soja R genes
  27. 27. >100 bp and <95% identity PAV sequence: 30.3 Mb G. soja specific: 11.3 Mb G. max specific: 19 Mbp PAV gene:354 G. soja specific: 338 G. max specific:16
  28. 28.  PAV: 24.3% of involved in defense response  Gs1-3: biotic and abiotic stress tolerance or plant development  56 resequencing accession: frequency G. soja> G. max Gs1 Gs2 Gs3 8kb
  29. 29. Wild Culitvated 1 2 3 4 5 1 2 3 4 5  Population bottleneck or artificial selection will result in the fixation of alleles during domestication 5. Selection genes during domestication
  30. 30. G. Soja Landrace Elite cultivar 25 accessions 93.55Gb 98.2%Glyma1.01  31 accessions (Lam et al. 2010) 17 G. soja 14 G. max  25 accessions Total: 5,102,244 SNPs Special: 25.5% specific to our accessionsspecific to our accessions Li et al. BMC Genomics, 2013
  31. 31. 0 10 20 30 40 50 60 G m 01G m 02G m 03G m 04G m 05G m 06G m 07G m 08G m 09G m 10G m 11G m 12G m 13G m 14G m 15G m 16G m 17G m 18G m 19G m 20 0 20 40 60 80 100 120 140 No. of region No. of genes No.ofregions No.ofgenes  394 regions: 1.47% of the whole genome (950M)  928 genes: 2.0% of 46,430 predicted genes  θπ(cultivated/wild) , Tajima’s D values, FST  20 Kb sliding window (2Kb step-size). Artificial Selection
  32. 32.  The distribution of selection regions were not random or even uniform throughout the genome  Appeared to be apparent clusters in certain genomic regions Gm08 Gm12 Similar to the distribution pattern of QTLs underlying domestication related traits (Ross-Ibarra, Genetics of Adaptation, 2005)
  33. 33. A homolog of the domestication gene Grain Incomplete Filling 1 (GIF1) in rice  GIF1 encodes cell-wall invertase that regulates sugar levels to meet with the demands of cell division and growth during the grain development.  Increased grain size and weight in transgenic rice From: Wang et al. Nat Genet, 2008 Selection gene: Glyma03g35520.1
  34. 34.  GmTfl1 (Glyma19g37890.1): Tian et al. 2010; Liu et al. 2010 gDNA cDNA θ π θ π GmTfl1 Glyma19g37890.1 Elite cultivars 1.86 0.98 0.98 0.52 Landraces 1.78 1.05 1.78 1.61 G. soja 1.65 1.28 0 0 Glyma03g35250.1 G. max (89) 0 0 0 0 Elite cultivars 0 0 0 0 Landraces 0 0 0 0 G. soja (20) 0.66 0.73 0.85 0.54  The homolog of Glyma03g35250.1 in sunflower experienced selective sweeps during evolution (From Blackman et al. 2011). Selection gene: Glyma03g35250.1
  35. 35. Confirmed some regions or genes • 100-seed weight: QTL by Yan et al 2014 Plant Breeding, 2014 Type No. of SNPs No. of haplotype Haplotype diversity Total 72 32 0.762 G.soja 71 28 0.952 Landrace 29 5 0.568 Elite 3 4 0.552 Total Wild Landrace Elite
  36. 36. Black Diverse color Yellow G. soja Landrace Elite cultivar CHS1, CHS3, CHS4, CHS5, and CHS9 Multiple-allele I locus Soybean seed coat color 0 1 2 3 4 5 6 -40000 10000 60000 110000 160000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
  37. 37.  The hierarchical genetic structure of soybean landraces was reflected with the geographic region.  A pan-genome was constructed by de novo sequencing and assembling seven G. soja accessions.  Inter-genomic comparisons identified up to 3,000 lineage- specific genes and genes with CNV, PAV or large-effect mutations, some of which may contribute to variation of agronomic traits such as resistance, seed composition, flowering time, biomass etc.  A set of candidate genes significantly affected by selection for preferred agricultural traits underlying soybean domestication were identified and some genes were confirmed.  These results will facilitate the harnessing of untapped genetic diversity from wild soybean for developing elite cultivars. Summary
  38. 38. Funding: National Natural Science Foundation of China State Key Basic Research and Development Plan of China (973) National Key Technologies R&D Program in the 11th Five-Year Plan (863) Acknowledgments Novogene Prof. Ruiqiang Li Guangyu Zhou Wenkai Jiang Zhouhuao Zhang University of Georgia Prof. Scott A. Jackson Purdue University Dr. Jianxin Ma

×