Short Tandem Repeats in plants: Genomic distribution and function prediction
1. Short Tandem Repeats in plants: Genomic
distributionandfunctionprediction
Presented by: Rana Asif Abbas
Presented to: Dr Munir Ahmad
Course:PBG-514
2. Introduction
Tandem repeats are short lengths of DNA that are repeated multiple times within a gene in
eukaryotic organisms.
They are anywhere from a some to more than a hundred.
There are three distributions of Tandem Repeats.
Microsatellite DNA is with base pairs 1-10 is Usually termed STR
Minisatellite DNA with base pair 10-100 base pair is 2nd distribution
Longer satellite DNA is with greater then 100 base pairs.
STRs have different motif types among distantly related species or even among closely
related species.
STRs are mainly present within noncoding regions
The STR motif types are affected by GC content with more GC-rich motifs being present
within higher GC content regions.
3. Recent STR studies mainly focused on the regulation of human diseases and the 2nd thing
the gene expression .
The studies focus on different STRs in plants but there is need to study differences among
STR among plants .
By next-generation sequencing technologies it is possible to study genome wide STRs in
plant
4. Method for analysis of STR
Sources of genome and CDS sequence
For this particular research Public literature and genome database used to
obtain the genomic information.
The 140 plants complete data available and obtained easily from platforms
like (NCBI).
Plants are divided in to 6 groups for study which are Algae, Ferns,
gymnosperms, Dicots, and Monocots.
STR detection and analysis:
To detect STR scientist used MISA ( micro satellite identification tool) in
research which is one of best and for correlation analysis Spss 25.0 used.
MISA help to identify perfect STRs as well as compounds. The MISA setting
for minimum number of repeats for mono-nucleotide is set at 12 while 6 for di-
nucleotide and 4 for tri-nucleotide
5. Researchers also analyze the STR motifs and their corresponding
reverse complement Motifs( CAG & GTC).
By help of well known custom pearl script results from MISA
translated and only nucleotides with A,T,G,C( valid nucleotides)
counted.
Researchers also find the STR density and GC content in research.
6. Long STR analysis a function predicted
The STR with length greater then 500 base pair can be extracted from CDS by
pearl script.
Long STRs functional role can be studied by translating its sequence by NCBI
ORD(open reading frame).
PSIPRED 4.0 used to predict the secondary structure of protein while BLASTP
used to investigate the role of long repeating sequences.
The study used long STRs of Gossypium hirsotum (250bp) and Solanum
tuberosum(200bp)
7. STR distribution
In the research 283,867,588 STRs were identified
from the 140 plant species, and their distribution
patterns were characterized in the six taxonomic
groups.
The STR abundance is correlated with genome
size (GS)(fig 1 part A) and negatively correlated
with STR density(figure 1 part B).
STR density ranges from 9.3 kb/Mb to 58 kb/Mb
in
Among the nine gymnosperm species, the highest
STR density is only 19.4 kb/Mb and the average is
15.1 kb/Mb.
8. Distribution of STR types and motif preference
The study shows that GC content of the STRs is highly correlated with the genomic
GC content. For example, the genomic GC content of algae ranges from 36.03% to
65.68% and the GC content of STRs ranges from 27.54% to 87.32%.
By analyzation of top ten motif of each species it is concluded that with a higher GC
content usually has more GC-rich STR motif .
Sorghum is exception as in Sorghum bicolor genome, despite the fact that most of its
motifs are GC-rich, the GC content of STRs is 40.02%.
The STR motif distribution patterns of algae were diverse,
The STR motif distribution patterns of dicots were mainly AT rich.
Closely related species showed similar types of motifs, and motif preference in the
three mosses was similar to that of dicots.
9.
10. Genomic patterns of STRs categorized
by motif size
STRs with hexanucleotides were the most abundant
type in the six groups, ranging from 30% to 64%.
The frequency of heptanucleotide repeats is second to
that of hexanucleotides except for algae with
trinucleotide repeats second to most.
The least abundant type of repeat was decanucleotide
accounting for between 0.6% and 2% (Fig. 4)
11. Conclusion
It has been concluded that if there is species with Large genome size it provides more
chances for the production of STRs but will have less STR density.
If look comparatively monocot has higher STR density then dicot and Algae show wide
variation in relative abundance and density of STRs compared with flowering plants.
STRs also have functional importance in regulating gene expression.
STRs in eukaryotes are more common in noncoding regions..
STRs are more likely to be produced in genomes with a high or low GC content .The GC
content of genomes affects the density of GC-rich repeats.
GC content in Poaceae and some algae is higher than that in other species, possibly due to
gene structure, recombination patterns, and GC-biased gene conversion
12. The present research focuses on STR identification and functions on a very broad scale
between plant groups. That help to understand genetics of plant groups but it lack the
comparison of STR impacts between species of same genus and within a species.
These two studies are most close to breeder interest as he need to study species with in
genus in verity development or with in species.
There is scope for further studies of STR in species with in genus and between verities of same
species. Example like between cotton and wheat and wheat and rice.
STR help in DNA finger printing and help in criminological DNA identification and can also be
helpful in plant DNA ancestorial linkage finding and evolution.