Pangenomics.pptx

UNIVERSITY OF AGRICULTURAL SCIENCES, BANGALORE
DEPARTMENT OF GENETICS AND PLANT BREEDING
PRESENTED BY-
MARUTHI PRASAD B P
ID. No. - PAMB 1066

POPULATION INCREASE!!!! CLIMATE CHANGE!!!!
PEST AND DISEASE
OUTBREAK!!!!
 Capturing maximum genetic
variation
 Time & Accuracy
 Understanding genome of crop
3

Reference genome - Tool which serve as a base for crop improvement
Numerous sequencing efforts have been undertaken in plants and, as a result, reference
genome sequences have become available for several crops, which serve as a base for
crop improvement efforts.
Tao et al. (2019)
4

Single reference genome is adequate?
SINGLE-REFERENCE GENOME
Single reference genome oriented Comparative
genome analysis
What if our reference genome is incomplete to
capture whole information's ?
5

Incomplete representation of genetic diversity Biases in the reference genome
Dynamic nature of the genome
Limitations of using single reference genome
6

BEYOND A SINGLE REFERENCE GENOME !!!
A Single Reference
Is Insufficient to
Fully Capture the
Diversity within a
Crop Species
Multiple Reference
Genomes Facilitate
Exploration of
Genetic Diversity.
Bayer et al. (2020)
The move from a single reference genome to multiple reference genomes will better
illuminate the mining of genetic diversity for crop improvement by providing a more
precise and comprehensive guiding principle
7

Outline of Presentation
9
Introduction
Pangenome
Structural variants
How is pangenome generated?
Case studies
Utilizing pangenomics for crop improvement
Current developments in pangenomics
Super pangenome
Challenges
Future prospects
Conclusions

Genomic data
derived from multiple
accessions and
cultivars
Full extent of
sequence variations
within a species
PAN-GENOMIC
approach to figure out new genes and alleles
directly related to phenotype
“A pangenome refers to the full complement of genes of a biological
clade, such as a species, which can be partitioned into a set of core genes that are
shared by all individuals and a set of dispensable genes that are partially shared or
individual speciﬁc.”
What is Pangenome?
10

Pangenome
Core Genes
Dispensable
Genes
Orphan
Genes
12

13
Difference between Core Genes and Dispensable Genes
Core Genes Dispensable genes
Highly conserved More variable
Ratio of non-synonymous to
synonymous mutations- Low
Mutation rate is high
Highly conserved functionally Less functionally conserved
House keeping genes Adaptive / Defense response genes

Herve Tettelin Duccio Medini
✔ Pangenomes were first
introduced by Tettelin et al., in
2005 to describe gene diversity in
Streptococcus agalactiae
Michele Morgante
✔ Pangenomics in plants was first proposed
by Morgante et al. (2007)
14
When and Who ?

Timeline of Developments in Pangenomic Research
2005 2006 2008
2007 2009 2013 2014 2015 2016 2017 2018 2019
The pangenome
introduced by Tettelin
Plant pangenome
concept proposed by
Morgante et al.
1. Human pangenome
2. Bacterial upper
kingdom pangenome
Pangenome of
phytoplankton
Emiliania huxleyi
1. Review of analytical tool and model
developed over 10 years of
pangenome research (Vernikos et al.,
2015)
2. E. Coli pangenome built using 1085
genomes
3. Rice accessory genome characterized
Pangenome of bread
wheat and stiff brome
1. Human
pangenome
2. Pig pangenome
Streptococcus
pneumoniae
pangenome
Escherichia coli
pangenome
1. Soybean and
wild relatives
pangenome
2. Maize
transcriptome
1. Brassica
oleracea
pangenome
2. Poplar
genome
Rice pangenome built using 3010
accessions
Saccharomyces cerevisiae
pangenome built using 10 isolates
15
Golicz et al. (2019)

Timeline of Developments in Pangenomic Research
‘‘map-to-pan’’ strategy 16
First graph-based plant pan-
genome was constructed in
soybean

MAJOR DRIVING FORCES FOR
SVs UNDERLYING THE
VARIABLE SEQUENCES OF
PLANT PAN-GENOMES
17

1.Transposable elements
18
Insertion of TEs in regulatory
regions
(tb1, vgt1, ZmCCT10, and
ZmCCT9 )

2. Non-Allelic Homologous Recombination (NAHR)
19

3.Genetic Introgression/Horizontal Gene Transfer (HGT)
20

4. Biased gene loss (fractionation) in polyploid plants
21

How a Pangenome is generated?
Li et al. (2022) 22

Pan-genome workflow
1. Selection of Germplasm 2. Identification of
Genetic Variation
3. Genotyping 4. Linking Genotypic
&Phenotypic variation
23

Selecting germplasm for a sequence assembly
24

Sequencing
25
NGS Technology TGS Technology

Assembly
errors
26
Missing gene

27
De novo assembly Iterative assembly Graph assembly

Error prone
Costly
high-quality data with high sequencing
coverage is required 28
De Novo Assembly

De Novo genome assembly
1. Short/Long reads
2. Contig assembly
3. Scaffold/Chromosome
assembly
4. Multiple alignment of
genomic regions
5. Pan-genome construction
29

• Less expensive
• It requires much less data
• Permits the assessment of large
numbers of individuals with
relatively low sequencing
coverage.
30
Changes in the gene
order

Mapping of the reads to the reference
sequence
Assembly of the unmapped reads
Building pangenome
31
Reference genome

32
Graph structure to represent
the diversity of genomic
sequences
Presents variation across
multiple genomes as different
paths along a graph of
sequence or variant nodes

Steps involved in graph-based pangenome
assembly
Read pre-processing
K-mer construction
Graph traversal
Pangenome construction
33

Software's used in graph-based pangenome construction
34

Crop species
PAN GENOME
Reference
No pan genes Core% Dispensable%
Triticum
aestivum
140 500 genes 57.70 42.30 Montenegro et al.,
2017
Glycine soja 59 080 genes
families
48.60 51.40 Li et al., 2014
Zea mays 41 903
transcripts
39.12 60.88 Hirsch et al., 2014
Brassica
oleracea
61 379 genes 81.29 18.71 Golicz et al., 2016
Brassica rapa 36 882 genes 84.56 15.44 Lin et al., 2014
Brachypodiu
m distachyon
37 886 genes 55.00 45.00 Gordon et al., 2017
Helianthus
annuus
61,205 genes 73.00 27.00 Hubner et al., 2018
A higher ploidy and outcrossing rate provides extra level of
diversity and therefore a larger pangenome with higher
percentage of dispensable genes.
36
Ratio of Core vs Dispensable genes

Case study 1
Objective:
To construct a chickpea pan-genome which provide insights into
species divergence, the migration of the cultigen (C. arietinum) and
identification of rare allele burden and fitness loss in chickpea.
Varshney, R. K. et al. (2021)
37

Results
Chickpea pan-genome (592.58 Mb) developed using an iterative
mapping and assembly approach.
38
A total of 29,870 genes were identified, of which 1,582 were to our
knowledge novel compared to previously reported genes.
Gene ontology (GO) annotations identified genes that encode response to
oxidative stress, response to stimulus, heat shock protein, cellular response
to acidic pH and response to cold, suggesting a possible role in adaptation.
The modeling curve analysis showed that chickpea pan-genome is closed

• Cultivated (2,258) and C. reticulatum (22)
accessions were analysed to discover structural
variations, as compared to the CDC Frontier
genome.
• More structural variations in the C. reticulatum
accessions because of their high divergence
from cultivated chickpea.
• They further identified 793 gene-gain copy
number variants (CNVs) and 209 gene-loss
CNVs in cultivated accessions, and 643 gene-
gain and 247 gene-loss CNVs in C. reticulatum
accessions.
39

Reconstructed the past history of effective size of
chickpea population using 150 randomly chosen
cultivated genotypes of chickpea using markovian
coalescent as implemented in SMC++ (Terhorst
et al., 2017).
1. Chickpea experienced a strong
bottleneck beginning around
10,000 years ago
2. The population size reaching its
minimum around 1,000 years ago
3. Followed by a very strong
expansion of the population
within the last 400 years, suggest
a strong recent expansion of
chickpea agriculture.
40

 Neighbour-joining tree constructed
indicates a clear out-grouping of wild
species accessions from cultivated
accessions
 The cultivated accessions formed
three distinct clusters
 One landrace from East Africa (ICC
16369) grouped together with wild
species accessions indicating that it is
mislabeled as belonging to the
cultivated chickpea 41

Conclusions from this study
They constructed a chickpea pan-genome and identified the novel genes which are
not reported earlier
Divergence tree constructed allowed them to estimate the divergence of cicer over
the last 21 million years
Identified selective sweeps of genes under domestication & bottleneck leading to
reduced genetic diversity
42

Case study 2
 To develop a high-quality rice pan-genome of genetically diverse rice
accessions through de novo genome assemblies
 Demonstration of the impact of structural variation on environmental
adaptations and agronomic traits
2021
Objective:
2021
43

Materials and methods
PacBio SMRT sequencing De novo assembled Assemblies were evaluated for completeness using
BUSCO
(Benchmarking Universal Single-Copy Orthologs)
44

Results
• They had built a pan-genome of cultivated rice
comprising 66,636 genes.
• Distribution analysis showed that 20,374 genes were
categorized as ‘‘core genes’’ and 46,262 genes were
categorized as ‘‘dispensable genes’’ which included
14,609 accession-private genes.
• They identified an average 24,469 SVs per accession
relative to Nipponbare.
45

Contribution of SVs in rice environmental adaptation
OsWAK112d gene, a known
negative regulator of blast
resistance
Two Independent deletions in
OsWAK112d gene contributed to
environmental adaptation by
enhancing blast resistance in
rice.
Fig 2. The distributions of the deletion of OsWAK112d in subpopulations of O. sativa and wild rice population
Fig 1.Schematic illustrating the deletions of OsWAK112d in the LJ and
N22 accessions
46

Association of Gene CNVs with variations in agronomic traits
In addition to SVs, gCNVs were inferred for 25,549(38.34%) of the protein
coding genes in the rice pan genome.
47
Short day Early flowering
Long day Delayed flowering & Increased grain number
CNV of OsVIL1 is likely associated with flowering time and grain number

• De novo assembly of 31 high-quality genomes for genetically diverse
accessions
• Pan-genome-scale resources and a graph-based genome reveal hidden SVs
and gCNVs
• The derived state of O. sativa SVs was inferred using the O. glaberrima
assembly
• SVs and gCNVs have shaped gene expression profiles and agronomic trait
variations
48

APPLICATIONS OF PAN-GENOMES IN PLANT
GENETIC STUDIES AND BREEDING
49

De novo domestication is essential for utilization of diversity present in CWRs
Zsogon et al. edited six loci [SELF PRUNING, OVATE, FASCIATED, FRUIT WEIGHT3.2,
MULTIFLORA, LYCOPENE BETA CYCLASE] in S. pimpinellifolium significantly increased its yield,
productivity, and nutritional value resulting in de novo domestication of tomato
Variation for fruit weight QTL fw3.2 caused by tandem duplication of the cytochrome P450 gene
elucidated with the help of tomato pangenome.
Alonge et al. (2020) 51

A model of heterosis proposed by Swanson-
Wagner
❖ Pan-genomics plays important role in
identifying gene members and families
contributing to heterosis
❖ A new gene and variant finding is
essential to explaining and utilizing
heterosis for crop improvement.
Dominance Hypothesis-
Complementarity of Genes between
Parental Lines drives Heterosis
52

The tomato pangenome un-covers new genes and a
rare allele regulating fruit flavor
4. Pangenome un-covers rare alleles
Objectives:
Construction of Pangenome of Tomato
Identification of PAVs and substantial gene loss during domestication
Identification of rare alleles
53

Material and Methods
Species Group No. of Accessions
Solanum pimpinellifolium (SP)
Wild groups
78
Solanum cheesmaniae ssp galapagense (SCG) 8
Solanum lycopersicum L. lycopersicum (SLL)
Domesticated
group
372
Solanum lycopersicum L. var. cerasiforme (SLC) 267
Total 725
The genome for each accession was de novo assembled producing a total of
306Gb of contigs longer than 500bp with N50 value of 3180bp
54

Violin graph
SP
Wild groups
SCG
SLC Domesticated
group
SLL
Selection of gene PAVs during tomato domestication & breeding
Gene loss during tomato domestication and subsequent improvement
55

“Who will last in the Run?”
Identification gene PAVs under selection
Scatter plots Gene selection preference during tomato domestication and improvement
Domestication (SLC Vs SP) Improvement (SLLheirlooms vs SLC)
Favorable Unfavorable
Domestication
Phase
120 1213
Improvement
Phase
12 665
Results suggest that more genes were selected against
than selected for during both domestication and
improvement of tomato
56

• Many of the favorable alleles lost in recent years as a result of breeding
emphasizing production over quality trait
Identification of rare alleles
Present in Heinz 1706
(Reference)
Non reference allele found
in pangenome
TomLoxC (Solyc01g006540) essential for C5 and C6 green-leaf
volatile production in tomato fruit
57
4,151-bp (~4-kb substitution) nonreference allele of the TomLoxC promoter
captured in Pan-genome
Rare allele in cultivated tomatoes
that reflects strong negative selection during domestication.

S. pimpinellifolium
SP
(47.4 %)
Modern SLL cultivars
(7.2%)
All heterozygotes
S. cheesmaniae SLC
(8.4 %)
SLL heirlooms
(1.1 %),
The frequency of the non-reference allele
Wild group Cultivated group
Most likely because of recent introgressions of rare allle
from wild into cultivated tomatoes to induce stress
tolerance
Cultivated land races Modern cultivars
58

• 4,873 novel genes identified which are not in reference genome
• PAVs analyses revealed substantial gene loss and intense negative
selection of genes during domestication and improvement.
• Identification of rare allele in the TomLoxC promoter selected against
during domestication.
• Lost or negatively selected genes are enriched for important traits in
present breeding scenario
59

A major concern in QTL mapping and GWAS based on SNPs from a single
reference genome is reference bias.
 Maize gene resistance to sugarcane mosaic virus identified by GWAS using
markers based on the B73 but not the PH207 because gene was absent in
PH207 assembly. (Coletta et al., 2021).
Using a pan-genome as the reference can reduce the misalignment
60

61
Coletta et al. (2021)
Contd….

Genomic selection is an alternative approach for complex traits controlled
by QTLs with small effects
Uses SNPs as predictors and it is biased with the use of single reference
genome.
 Pangenomic data can be
used to identify new markers
to improve prediction
accuracy
 Practical haplotype graph
approach (PHG).
62

Employing Genomic prediction in crop improvement
63

64
Enhance food and nutritional security Understand the genomic basis of unique traits

Pangenomics enhances the precision and effectiveness of CRISPR-Cas
mediated gene editing.
66

• Dispensable genome is enriched with environmental response genes
• Pangenomes can be used in detection of sequences associated with
agronomically relevant traits
• This lead to the transition from so-called genomic-assisted to pan
genomic-assisted breeding strategies for….
1.Resistance to
biotic
stress/Disease
resistance
2.Vernalization
and flowering
time
3.Fruit, grain,
yield and seed
quality
4.Abiotic stress
tolerance and
Resistance
5.Plant
architecture
67

 CNVs at Rhg1 locus –resistance
to cyst nematode (soybean)
(Cook et al., 2012)
 Absence of sulfotransferase gene
in PAVs with various sizes-
resistance to striga (Sorghum)
(Gobena et al., 2019)
 Deletions in the Pi21 gene
results in quantitative and
durable resistance against blast
disease (Rice) (Fukuoka et al.,
2009)
Duplications
68

Gene encoding a Fe2+/Zn2+
regulated transporter
associated with iron deficiency
chlorosis in soybean
The PAVs in Sub1A gene -
submergence tolerance &
2 ERF genes SNORKEL
1&SNORKEL 2 - deep water
response in rice.
(Liu et al., 2020)
(Xu et al., 2006)
69

Brachypodium distachyon reference (Bd21) having
shortest flowering.
This dispensable gene Brdisv1ABR21022861m is present
in delayed and extremely delayed flowering species
(Gordon et al., 2017)
In Brassica napus pangenome analysis based GWAS
identified variations causing several agronomically
relevant traits including silique length, seed weight and
flowering time (Song et al., 2020)
70

The pepper pangenome GWAS
revealed the deletions in genes
involved in carotenoid and
capsaicinoid biosynthetic
pathways (Ou et al., 2018)
In rice, 1212bp deletion of the
GW5 gene causes variation of
grain width and grain weight.(Liu
et al., 2017)
71

In wheat, a extra copy of
Rht-D1b resulting in
reduction of plant height
(Li et al., 2012)
In rice, a deletion in qPE9-
1, resulted in in erect
panicles(Zhou et al., 2009)
72

SV’s causing variation of agronomically important traits in major crops
Species Gene(locus) Trait Type
Rice GW5/qSW5/GSE5 Grain size PAV
Rice GL7 Grain size CNV
Rice SGDP7 Grain size, grain number, yield CNV
Rice Pikm1-TS Blast resistance PAV
Rice Pikm2-TS Blast resistance PAV
Rice Sub1A Submergence tolerance PAV
Rice Pup1 Phosphorus-starvation tolerance PAV
Rice SNORKEL1 Deep water response PAV
Rice SNORKEL2 Deep water response PAV
Rice qPE9-1 Plant architecture PAV
Rice Pi21 Blast disease PAV
Rice Sc Hybrid male sterility CNV
Rice DPL1/DPL2 Hybrid male sterility PAV
Rice S27/S28 Hybrid male sterility PAV
Rice OsSh1 Shattering PAV
Maize KRN4 Kernel row number PAV
CNV –Copy Number Variation PAV – Presence Absence Variations
73

74
Maize TB1 Apical dominance PAV
Maize Vgt1 Flowering time PAV
Maize qHSR1 Resistance to head smut PAV
Maize ZmCCT9 Photoperiod sensitivity PAV
Maize MATE1 Aluminium tolerance CNV
Sorghum LGS1 Resistance to striga PAV
Sorghum Sh1 Shattering PAV
Sorghum SbMATE Aluminium tolerance PAV
Wheat Lr10 Leaf rust PAV
Wheat Yr36 Stripe rust PAV
Wheat FR-2 Cold tolerance CNV
Wheat Vrn-A1 Vernalisation CNV
Wheat Ppd-B1 Photoperiod sensitivity CNV
Wheat Rht-D1b Plant height CNV
Barley FR-H2 Frost resistance CNV
Contd….

Current developments in pangenomics
CURRENT STATUS AND FUTURE
ASPECTS OF PANGENOMIC STUDIES
75

This database provides:
1. Basic information of 3,010 (3k) rice accessions
2. Sequences and gene annotations for the rice pan-genome
3. Gene presence-absence variations (PAVs) of rice accessions
4. 260Mbp novel sequences
5. At least 12,000 novel genes absent in the reference genome were
included
6. Expression profiles for rice pan-genome
76

‘RICE PAN-GENOME GENOTYPING ARRAY’ANALYSIS
PORTAL(RAP)
http://www.rpgaweb.com 3K Rice Reference Panel and subsequent GWAS
77

Uses:
It is capable of scanning presence/absence variants (PAVs) and constructing a
fully annotated pan-genome
Overview of the ppsPCP pipeline.
(github DOI: https://doi.org/10. 5281/zenodo.2567390 and webpage
http://cbi.hzau.edu.cn/ppsPCP/)
78

• GET_HOMOLOGUES-EST
• PanVC
• GenomeMapper
• PanGP
• SplitMem
• ITEP
• EUPAN
• PANNOTATOR
• PanOCT
• seq-seq-pan
1. Cluster analysis of functional genes
2. Pangenome profile analysis
3. Genetic variation analysis of functional
genes
4. Species evolution analysis
5. Function enrichment of gene clusters
79
Tools-Pangenome analysis

80
Software / Tool Description / Role URI link
PanSeq Extract the regions unique in the genome, Identify the SNPs and
construct the file for phylogeny programme
https://lfz.corefacility.ca/pans
eq/
PanFunPro x Homology detection and pairwise genome analysis in pan/core genome. https://zenodo.org/record/758
3#.YTR36p0zY2w
PGAP Detection of homologous genes, orthologous genes, SNP, phylogenetic
studies, pangenome plotting and functional annotation.
http://pgap.sf.net
PanACEA Identification of genomic regions those are phylogenetically dissimilar. https://github.com/JCVenterIn
stitute/PanACEA
PGAP-X Genome diversity and visualize genome structure and gene content to
understand the evolution.
http://pgapx.ybzhao.com/
PAN2HGENE To identify new products, resulting in altering the α value behavior in
the pangenome without altering the original genomic sequence.
https://sourceforgenet/projects
/pan2hgene-software
BGDMdocker For pangenome analysis, visualization, clustering and genome
annotation
https://www.docker.com/what
isdocker
Tools-Pangenome analysis

Cross-species pangenomes and evolutionary studies
Single species
Multiple related species
using superpangenome
81

Super Pangenome
• Useful to transfer genes from the
species belonging to distantly
related gene pools.
• Breeding of crops better adapted to
diverse environments and more
resilient to climate change
82

Challenges in Pangenome studies
Several species have large, complex genomes, making numerous
assemblies per taxon cost-prohibitive
 Assembly errors can lead to the detection of false SVs
Consolidation of pangenome variation into a single reference or
coordinate system
Polyploidy and heterozygosity are the challenging in genome
assembly
83

Li et al. (2022) 84
FUTURE PROSPECTS
New tools are required to support
variation graph assembly, pangenome
construction and visualization
An integrated pangenome browser
should be developed, capable of
representing SNPs and SVs for genome
analysis
Expanding the pangenome beyond
species will increase the use of wild gene
sequence diversity in crop improvement.

Dispensable
genome
Association
Studies
Genome
editing
Next
Generation
crops
Evolution
studies
CONCLUSIONS
85

Pangenomics.pptx

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Pangenomics.pptx

Similar to Pangenomics.pptx (20)

Recently uploaded

Recently uploaded (20)

Pangenomics.pptx