The document discusses tropical maize genomics, outlining what is currently known about tropical maize genomes from projects like the maize HapMaps. It describes how genomic information can be used to unlock genetic variation in tropical maize germplasm and drive molecular breeding efforts through approaches like genome-wide association studies, marker-assisted selection, and the development of multiple panels of SNP markers. The document also explores how plant breeding will increasingly be driven by big data and artificial intelligence.
The Mariana Trench remarkable geological features on Earth.pptx
Tropical maize genome: what do we know so far and how to use that information
1. Tropical maize genome:
what do we know so far and how to use that information
Yunbi Xu
CIMMYT-China, Institute of Crop Science, CAAS, Beijing, China
13th Asian Maize Conference and Expert Consultation on
“Maize for Food, Feed, Nutrition and Environmental Security”
Radisson Blu Hotel, Ludhiana, India, October 8‐10, 2018
2. Outline
Introduction
Why tropical maize genomics
What do we know so far about tropical
maize genomes
What can do with the genomics
information from tropical maize
Molecular breeding driven by big data
and artificial intelligence
3. (1) An intermediate genome size compared to rice and wheat;
(2) Typical outbreeding system with flexibility for inbreeding;
(3) Multiple breeding products (inbreds, hybrids, synthetic
varieties, open pollinated varieties and improved landraces);
(4) Wide adaptability, especially for stressed environments;
(5) Multiple-purpose crop: 5Fs
food (grain), feed (grain and stalk), fuel (grain and stalk),
forage (young grain and stalk)
fruit (sweetcorn, baby corn, fresh corn)
Xu and Crouch 2008
In: Genomics of
Tropical Crop Plants
Maize as an economically important crop and
suitable as an experimental model
4. Temperate maize in cooler climates beyond 34°N and 34°S
Tropical maize in warmer environments located between
30°N and 30°S latitudes
Lowland (sea level to ≤1000 masl)
Mid-altitude (1000 to 1600 masl)
Highland (≥1600 masl)
Subtropical maize between the 30° and 34° latitudes.
Maize is adapted to diverse environments
Tropical maize is grown in over 60 countries, occupying about
60% of the area harvested and representing 40% of the world
production.
6. Most of the tropical germplasm may
not be considered as manageable
resources, as they could be too diverse
to be used directly and have to undergo
a pre-breeding process.
Utilization of genetic variation
in tropical maize
Xu and Crouch 2008
In: Genomics of Tropical Crop Plants
Genetic variation can be unlocked from
tropical maize germplasm through
genetic approaches such as large-scale
and systematic identification and
characterization of QTL
7. Outline
Introduction
Why tropical maize genomics
What do we know so far about tropical
maize genomes
What can do with the genomics
information from tropical maize
Molecular breeding driven by big data
and artificial intelligence
8. Genomic Gaps Left by Reference Genomes
Single genomes that have been sequenced up to 90% are used as
reference genomes
Resequencing indicates that 20-50% of the original reads from
different ecotypes cannot be mapped to the reference genome
50% Hi-seq reads from tropical maize cannot be mapped;
while only 20% of the SNPs from landraces can be mapped
to the B73 reference (Peter Wenzel, CIMMYT)
Transcriptome sequencing of 503 maize inbred lines identified
8681 representative transcripts
16.4% were expressed in all lines
50% being absent in the B73 reference (Hirsch et al. 2014)
9. Single-genome based references provide only a partial
genome coverage, which results in
Got lost in map-based cloning
Missing of 40% or more important QTL/genes in AM
Biased estimation (Ascertainment bias)
Genetic diversity
‘Population structure
LD and IBD
Haplotypes
MAS
Inefficient procedures
Unpredictable results
The Results of Partial Genome Coverage
10. A multiple genome-based pangenome, is needed for unlocking
genetic variation that is hidden in diverse tropical maize
collections, to provide a complete profile of genetic variation,
including favorable alleles and haplotypes, at various omics
levels across elites, landraces, and wild relatives.
Technology transfer from temperate maize to tropical maize and
capacity building in tropical countries are needed for
improvement of tropical maize.
Comparative genomics across tropical maize germplasm and
temperate maize will help identify novel genes and alleles
required for improvement of both temperate and tropical maize.
More attention should be given to
tropical maize genomes
11. SNP1 SNP2 SNP3
Chromosome 1 AACACGCCA …. TTCGGGGTC….AGTCGACCG ….
Chromosome 1 AACACGCCA …. TTCGAGGTC….AGTCAACCG ….
Chromosome 1 AACATGCCA …. TTCGGGGTC….AGTCAACCG ….
Chromosome 1 AACACGCCA …. TTCGGGGTC….AGTCGACCG ….
Individual 01 CTCAAAGTACGGTTCAGGCA
Haplotype 1 Individual 02 CTCAAAGTACGGTTCAGGCA
Individual 03 CTCAAAGTACGGTTCAGGCA
Individual 04 CTCAAAGCACGGTTGAGGCA
Haplotype 2 Individual 05 CTCAAAGCACGGTTGAGGCA
Individual 06 CTCAAAGCACGGTTGAGGCA
Individual 07 CTCGAAGTACGGTTCAGGCA
Haplotype 3 Individual 08 CTCGAAGTACGGTTCAGGCA
Individual 09 CTCGAAGTACGGTTCAGGCA
Individual 10 CTCAAAGCACGGTTCAGGCA
Haplotype 4 Individual 11 CTCAAAGCACGGTTCAGGCA
Individual 12 CTCAAAGCACGGTTCAGGCA
A T C
/ / /
G C G
Tag SNPs
SNPs
The concept
of haplotype
Haplotype
Revised from
Xu et al 2017. J Exp
Bot 68: 2641–2666
12. Contribution of tropical maize genomics to
potential-increasing and gap-closing
Xu et al. 2017.
J. Exp. Bot.
68: 2641–2666
13. Outline
Introduction
Why tropical maize genomics
What do we know so far about tropical
maize genomes
What can do with the genomics
information from tropical maize
Molecular breeding driven by big data
and artificial intelligence
14. Maize HapMaps
HapMap 1 Gore et al 2009 Science 326: 1115-1117
27 diverse maize lines (7 tropical lines)
Array-Maize SNP50 developed
56,110SNPs chosen from >840,000 SNPs
Covering 2/3 predicted genes
HapMap 2
Chia et al 2012
Nat Genet 7:803-807
103 maize lines
25 tropical inbreds
23 landraces
19 wild relatives
55M SNPs
16. Pan-genomes
Building pan-genome sequence anchors using genetic mapping
approaches combined with machine learning algorithm
14,129 maize inbred lines 26 M tags
Lu et al 2015 Nature Communications 6:6914
17. Pangenomic information can be incorporated by databases, by
which the core genome, variable genome and the expression
levels can be linked (Golicz et al. 2016).
The exome is many times smaller than that of the whole genome,
making exome sequencing data more easily manageable and
applicable in plant breeding (Warr et al. 2015).
Pangenomic information and exome can be integrated with other
functional genomics approaches and used to discover genes and
their functions.
Pangenomic information and the exome
18. 30133 SNPs from 600K Affymextrix
Axion Maize Genotyping Array
1068 SNPs from Illumina
MaizeSNP50 BeadChip
9395 SNPs from RNA-seq data of
368 maize lines
4067 SNPs for filling gaps in the B73
reference
734 SNPs for heterotic grouping
132 SNPs for transgenic eventsZou et al 2017
Mol Breed 37: 20
In collaboration with
CapitialBio Technology
and Daxiong Seed Co.
A 55K-SNP chip with improved genome coverage
by large-scale resequencing tropical maize
19. Selection and development of 20K SNP markers from the 55K SNP
array with improved genome coverage
Target sequences are enriched by in-solution probes
Using the same panel of 20K SNP markers to generate 10K, 5K
and 1K SNP markers by sequencing at different depths
Test and validation using two genotype panels:
96 diverse maize germplasm from China, USA and CIMMYT
387 breeding lines generated in CAAS maize breeding programs
Suitability: 50-40000 SNPs, SSRs or InDels
Genotyping cost:10 USD/2K, 20USD/ 20K, 30 USD/40K
Developed by CIMMYT-China, Institute of Crop Science (CAAS)
and Shijiazhuang Molecular Breeding Incorporation
Multiple panels of SNP markers developed by
Genotyping by target sequencing (GBTS)
The system can be used for all organisms (plants and animals)
Guo et al 2018 in preparation
20. Potential applications of the marker panels in genomics,
genetics and plant breeding
MAGE +++ ++ + +
Heterotic grouping +++ +++ ++ ++
Marker linkage map construction +++ ++ + +
Linkage mapping for major traits +++ +++ ++ +
Genomewide association study ++ +
Selection in selfed populations +++ +++ +++ +++ +++
Gene transfer by backcrossing +++ +++ +++ ++ +
Gene pyramiding +++ +++ ++ + +
Variety protection and IP issues +++ +++ ++ ++ +
Applications ≥20K* 10K 5K 1K <200
MAGE: marker-assisted germplasm evaluation, including differentiating cultivars and
classifying inbed lines into heterotic or ecological groups; identifying gaps and
redundancy in germplasm collections; monitoring genetic shifts that occur during
germplasm conservation, regeneration, domestication, and breeding; identifying novel
and superior alleles for improvement of agronomic traits; and constructing
representative subsets or core collections (Xu 2003, 2010).
Guo et al 2018 in preparation
21. Outline
Introduction
Why tropical maize genomics
What do we know so far about tropical
maize genomes
What can do with the genomics
information from tropical maize
Molecular breeding driven by big data
and artificial intelligence
22. Dissect the genetic structure of their germplasm to
understand gene pools and germplasm (heterotic) groups
Provide insights into allelic content of potential germplasm
for use in breeding
Screen early generation breeding populations to select
segregants with desired combinations of marker alleles
associated with beneficial traits (in order to avoid costly
phenotypic evaluations)
Establish genetic identity (fingerprinting) of their products
The applied genomics information and tools
routinely used by large multi-national seed companies
Are also utilized by small- and medium-seed companies
and developing countries for breeding tropical maize
23. GWAS using Multi-Hybrid Populations
Resequenced maize inbreds can be used to cross with each other to
develop multiple hybrid populations for GWAS, and their genotypes
inferred from the sequenced parental lines.
The parental lines can be easily shared and used to produce different
subsets of multiple hybrids based on the objectives of gene target and
discovery.
An example has been provided for GWAS of flowering time using 55K SNP
markers and 724 hybrids (Wang et al 2017).
Similarly, two approaches, geographic associations and F-one association
mapping (FOAM), were integrated to characterize the diversity of 4,471
maize landraces, with 1,005 genes identified across 22 environments
(Navarro et al 2017).
24. Navarro et al 2017
Nat Genet 49: 476-480
Significance for
flowering time, and
overlap between
flowering time and
latitude- and altitude-
associated SNPs.
25. Target traits
Yield and heterosis
Quality (e.g., QPM from tropical maize)
Abiotic stresses (with tropical maize as donors)
Biotic stresses
Diseases as examples
Global diseases (most maize growing environments): leaf
blights, leaf rusts, leaf spots, stalk rots and ear rots.
Regional diseases
Asia - downy mildews, which are also spreading to some parts
of Africa and the Americas;
Africa – MLN, maize streak virus and the parasitic weed Striga;
Latin America - maize stunt and tar spot.
Marker-assisted recurrent selection
and genomic selection for tropical maize
26. Integrated plant breeding platforms
for breeding tropical maize
Efficient breeding pipeline
Integrated DH and MAS procedures
Excellency in Breeding resources
Open-source breeding networks
Xu et al 2017. J Exp
Bot 68: 2641–2666
27. Maize molecular breeding initiatives in China
supported by integrated plant breeding platforms
Tongzhou International Seed GS Breeding Initiatives
Genotyping 100 GS populations supported by Beijing
governmental funds
One + Eight Breeding Initiatives
Including one institute (Institute of Crop Science, CAAS) and
eight seed companies, with genotyping cost subsidized by
Ministry of Agriculture and Rural Affairs
Jiusuo Breeding Initiatives
Seed companies in the winter nursery Sanya, China, through
fully open source breeding by sharing everything
28. Outline
Introduction
Why tropical maize genomics
What do we know so far about tropical
maize genomes
What can do with the genomics
information from tropical maize
Molecular breeding driven by big data
and artificial intelligence
29. Plant breeding is increasingly driven by big data
Medium:field book => EXCEL => databases
Scale:k=> m => b => t
Dimension:one(phenotype)=> two(phenotype +
genotype)=> three(phenotype + genotype + envirotype)=>
four( phenotype + genotype + envirotype + time)
Throughput (data generated in one experiment or unit time):
1=> 100(1*96)=> 10000 (96*96) => 1m(384*3072)=>
100M(384*300K)
Precision:repeatability, duplicability, compatibility, additivity,
predictability
Data revolution
30. Multi-omics data
Multi-phenotypic data
Multi-environmental data
Integrated data
Empirical breeding
Selection indices
Parental selection and mating
Combining ability
Heterosis and hybrid performance
Parental relationship
Genetic distance
Long-term selection data
Growth and development
Dynamic changes
Varietal transition
Quality and nutrients
Abiotic stresses
Biotic stresses
… … …
Multiple sources of plant breeding data
31. Xu 2016 Theor Appl
Genet 129: 653–673
More attention should be given to envirotypic data
32. Experimental design and data analysis are more costly
and time consuming, compared to data generation
CIMMYT and donors are eager
to maximize the use and impact of data
Kate Drehe 2013
CIMMYT Science Week
33. AI-assisted breeding system will play significant roles in
theoretical study, evaluation, selection, breeding procedure
development, and field management.
AI will have significant influence on breeding information
system because AI-equipped robots will interact with all the
processes relevant to data collection, storage, analysis, sharing
and utilization.
AI system will benefit from historical experience and relevant
knowledge achieved and accumulated in breeding programs.
The breeding system driven by big data and AI will have great
capacity of designing and predicting in breeding programs with
improve breeding efficiency and enhanced genetic gain,
through machine learning, optimization and simulation.
Plant breeding will be driven by artificial intelligence
34. Acknowledgements
ccMaize Group: Wen-Xue Li, Cheng Zou, Shanhong Wang
CIMMYT: Daniel Jeffers, Mike Olsen, B. M. Prasanna
International Collaborators
CAAS, Sichuan Agric Univ, China Agric Univ
Cornell University, Cold Spring Harbor Laboratory
Funds
CGIAR MAIZE
Bill and Melinda Gates Foundation
China National Natural Science Foundation
Ministry of Science and Technology of China
The Agricultural Science and Technology Innovation Program
(ASTIP) of Chinese Academy of Agricultural Sciences
Thank you for your attention