2. Repeated sequences (also known as repetitive elements,
or repeats) are patterns of nucleic acids (DNA or RNA) that
occur in multiple copies throughout the genome.
35000 genes
3% coding sequence
(Large no. of repeat sequences)
60% single copy genes
40% repeat sequences
1)30% low to moderately repeat sequences
2)10% highly repeat sequences
Repeat Sequences
3. Multigene Families
Genes Encoding RNA
Pseudogenes
Extra-genic DNA
Categories of Repeat sequences
4. Refers to functional genes present as repeat
sequences(more than 1 copy in genome)
Has several subcategories
1)Classical Gene families
2)Genes Encoding Domains
3)Genes Encoding Motifs
4)Gene Super-family
Multi-Gene Families
5. Show high degree sequence homology throughout
the gene length or at least at the coding sequence
1st and 2nd copy have same sequences
3rd copy has only same coding sequence
1)Classical Gene Families
6. Examples
Ribosomal RNA genes on p arm of Chromosome
13,14,15,20,21 show classical homology
Histone genes on chromosome 1, 6 and 12 show high
degree of sequence homology
1)Classical Gene Families
7. Genes not showing sequence homology
Have variation among copies
In most cases, encoding large DOMAINS(large
sequences which perform specific functions)
Examples
1) Paired Box Domain
2) Homeo box Genes
2)Genes Encoding Domains
8. Paired Box Domain
390 bp
Encodes a paired domain
Present in several genes called PAX gene
Involved in transcription
Homeo Box genes
180 bp
Encodes a homeo domain of 60 amino acids
2)Genes Encoding Domains
9. No sequence homology
No domain encoding sequence
Have small motif encoding sequence MOTIF(Small
sequence of DNA, amino acids, nucleotides or
proteins performing specific functions)
Examples
1) DEAD Box genes
2) WD Gene family
3) Genes Encoding Motifs
10. DEAD Box Genes
8 Amino acid motif
D-> Aspartic Acid
E-> Glutamate
A-> Alanine
D-> Aspartic acid
Responsible for secondary structure of RNA molecule
WD Gene Family
2 small amino acid motifs but several WD
W->Tryptophan
D->Aspartic Acid
All are involved in cell division, transcription, mRNA
modifications, cell signalling
3) Genes Encoding Motifs
11. No sequence homology
No domain encoding sequence
No motif encoding sequence
Protein encoding genes are structurally and
functionally related to each other
Examples
1) Immunoglobulin molecules
2) T-cells receptor Genes
4) Gene Super-family
12. Immunoglobulin Molecules
Domain structure same and involve in immune system
Have three clusters
Present on chromosome 3,14,22
T-cell Receptor Genes
Have 4 clusters
Present on chromosome 7p,7q,14p, 14q
4) Gene Super-family
13. Genes are present in:
1)Cluster form
A gene cluster is a group of two or more genes found within
an organism's DNA that encode for similar polypeptides, or
proteins, which collectively share a generalized function and
are often located within a few thousand base pairs of each
other.
Within gene clusters all genes are not fully functional but non
functional copies of genes are also present called as pseudogenes
Arrangement of genes in Multigene
Families
14. Examples:
Ribosomal RNA genes present in 5 clusters on chromosome 13,
14,15,20,21. In these clusters only RNA genes are present and no
other gene is there
Histone Genes present on chromosome 1,6,12. Only histone
genes are present here.
Hemoglobin genes. Alpha chain genes on chromosome 16 and
beta chain genes on chromosome 11
Immune system genes on chromosome 6q(more than 200
clusters). These clusters are distributed by different classes of
genes involved in steroid hormone synthesis
Hox genes have 4 clusters on chromosome 2,7,12 and 17.
1) Cluster form
15. Highly repetitive DNA sequences spread throughout the
genome. These are usually present on different
chromosomes
Examples:
Aldolase gene: 5 copies as repeats on different
chromosomes( 4 functional : 1 pseudocopy)
Active Filament Gene: 20 Copies (only 4 functional: 16
pseudocopies)
PAX gene(transcription factor gene): 9 copies (1 functional:
8 pseudocopies)
2)Interspersed Form
16. 1)tRNA genes
1300 genes arranged in cluster form
10-100 copies in each cluster
2)rRNA genes
Genes for 28S,18S,5.8S rRNA are present in cluster
form on p arm of chromosome 13,14,15,20,21(all are
acrocentric chromosomes i.e. centromere present
towards one end)
Genes for 5S rRNA on chromosome 1q
Genes Encoding RNA
17. Size of ribosomal DNA= 2 MB
On each of these 5 clusters we have
on all 5 clusters we have a 27 kb intergenic sequence
between two 13kb transcriptional units
Genes Encoding RNA
13 kb transcriptional unit
27 kb intergenic distance
13 kb transcriptional unit
27 kb intergenic distance
18. Towards 5^ end 0f 13kb transcriptional unit, we have:
1) A DNA segment called External transcribed spacer(ETS)
2) 1n 18S rRNA gene
3) Another segment called internal transcribed spacer-1(ITS-1)
4) Then 5.8S rRNA gene, then ITS-2 and then 28S rRNA gene
13 kb transcriptional unit
ETS 18SRNA ITS-1 5.8S RNA ITS-2 28S RNA 27 kb intergenic distance
whole Transcribed into
Multigenic RNA transcript
Genes Encoding RNA
19. Multi-genic RNA transcript
Size is 45S (sedimentation rate coefficient; speed at which particles settle
down in centrifuge; 1S=10^-13sec)
It undergoes cleavages:
1st cleavage: removes ETS; 41S is left
2nd cleavage: slightly towards ITS-1; generates two RNA segments of 20S and
32 S
3rd cleavage: removes ITS part; generates 18S RNA
Further Cleavages: ITS-1 & 5.8S; 5.8S & ITS-2; ITS-2 & ID
This is how individual RNA’s are produced from multi-genic RNA transcript.
ETS+ITS-1+ITS-2= 6Kb
Cleaved by nucleases to generate individual nucleotides
Genes Encoding RNA
20. a section of a chromosome that is an imperfect copy
of a functional gene
No. of pseudogenes vary from chromosome to
chromosome
Distributed throughout the genome
Present as cluster or interspersed form
5 different categories
Pseudogenes
21. Same structure as that of functional copy i.e. same
introns, exons and other sequences
Most pseudogenes belong to this category
Pseudocopy in 4th exon stop codon
Wild type stop codon
Pseudogene premature stop codon stop codon
(makes wild type non functional)
i)Non processed/conventional
pseudogenes
22. Both have 3 exons and intron sequences
Wild type stop codon
Pseudogene premature stop codon stop codon
(initially expressing)
When 2nd copy was generated it was initially
expressing, then it generated a premature stop
codon on exon 2 and became non functional
ii) Expressed Non-processed
pseudogenes
23. Example
Alpha globin gene cluster on chromosome 16
It has θ gene which never becomes the part of Hb
gene
Initially it has low expression but then develops a stop
codon and becomes non functional
ii) Expressed Non-processed
pseudogenes
24. Processing occurs, Removes intronic sequences and makes
it non functional
It is present as such in genome but is non-functional
Same process occurs in CDNA synthesis where reteroviruses
convert RNA into CDNA with the help of RNA transcriptase.
CDNA contains only coding sequences and no intronic
sequences are present.
iii) Processed Pseudogenes
25. How they are present in humans?
During evolution we have RT-encoding sequence which
have RT like activity and convert mRNA into cDNA and
incorporate it in genome. This cDNA becomes non-
functional
Gene->RNA->mRNA->cDNA->integrated in chromosome
as such->non-functional
iii) Processed Pseudogenes
26. Processed and integrated in genome; also expressing
because integration occurs at such point which has a
promoter which helps in expression
iv)Expressed Processed Pseudogenes
27. Only have 5` and 3` sequences.. Rest of the gene is
missing
Wild
5` end truncated 3` end
Or we have an exon and some DNA segment and rest
is missing
v) Truncated genes or gene
segments
28. By gene duplication
It has 3 possibilities
1)By recombination between non-allelic sequences among sister chromatids
2)By recombination between non allelic sequences among non sister chromatids
3) Sleeping mechanism
Recombination b/w either sister or non sister chromatids involves repeat sequences. This
is called sleeping mechanism
e.g. CAG CAG CAG
GTC GTC GTC
(In meiosis recombination occurs b/w allelic sequences while in gene duplication b/w non
allelic sequences)
How are pseudocopies generated?
29. Repeat sequences which are not part of any genome
1) Tandomly Repeat DNA
2) Interspersed Repeats
Extragenic DNA
30. Present next to each other
In blocks or arrays
Depending on the block size which contains repeat
sequences, it can be divided into further 3 categories
i)Satellite DNA
ii)Mini satellite DNA
iii)Micro satellite DNA
1) Tandomly Repeat DNA
31. Size of block varies from 100 kb to several megabase
Further classified into:
a) Satellite 1:
48 kb
Located near centromere heterochromatic(inactive) region
AT rich sequence
b) Satellite 2 and 3:
5 bp repeats ATTCC
Present throughout chromosome
i) Satellite DNA
32. c) α satellite(α-alphoid)
171 bp repeats
Located at centromere heterochromatic region
d) β-satellite
68 bp
Present on centromere of chromosome
1,9,13,14,15,20,21
i) Satellite DNA
33. Size of block 0.1 kb to 20 kb
Polymorphic as well
Further divided into 2 categories:
a)Hypervariable mini satellite sequence
Located on sub-telomeric region
Present on all chromosomes
By comparing HVMSS on different chromosomes, we
observe sequence variation
Core sequence GGGCAGGAXG same on all chromosomes
ii)Mini satellite DNA
34. b) Hexa-nucleotide Sequences TTAGGG:
6 bp repeats
Present on telomeric region of all chromosomes
10-15 kb region of telomere has this sequence
Telomerase RNA + Protein -> functions as template
This sequence is added at the end of chromosome by
telomerase
ii)Mini satellite DNA
35. Size of block 50bp-500bp
Present throughout the chromosome
Have several sub categories
Runs of A’s : AAAA--- Repeat sequences
Runs of T’s : TTTT---- Repeat sequences
(about 0.3% of human genome contains such sequences)
Similarly, GGGG---repeats and CCCC---repeats are also present but
they are very rare
Dinucleotide repeat: CA repeat (0.5% of human genome)
CT/AG Repeats : (0.2% of human genome)
Trinucleotide repeats
Tetra-nucleotide repeats
iii) Micro Satellite Sequences
36. All repeats are highly polymorphic i.e no. of repeats varies
e.g. on chromosome 1q
Individual 1: 100 CA
Individual 2: 102 CA
Individual 3: 98 CA
Or within the same individual they are also polymorphic (
one allele=101 repeats; other allele= 105 repeats)
Tri and tetra nucleotides are more polymorphic than others
These repeats are used in DNA fingerprinting
iii) Micro Satellite Sequences
37. Extragenic repeats distributed throughout the
genome
Subcategories are:
1) SINE-1(AluD)
2) LINE-1(Kpn-1)
3) MER family
4) THE-1
5) HERV family
2) Interspersed Repeats
38. Shorter interspersed nuclear elements
Alu D restriction sites: after every 4-6 kb we have Au-1
sequence
Size=280 bp
More than 1 million copies
Most abundant repeat sequences
Have 2 units(120 bp repeats)
32 bp additional sequence in 2nd monomer of 120 bp
sequence which is missing in 1st monomer
120 bp 120 bp 32 bp integrated seq
i)SINE-1
AAA
TTT
AAAA
TTTT
39. 7SLRNA Sequence
SRP(signal recognition particle)= 7SLRNA(size is 300
nucleotides) + 6 proteins help in transport of
proteins
Alu-1 sequence is very similar to 7SLRNA; probably
Alu-1 is considered to be derived from 7SLRNA
through transposition
i)SINE-1
40. Promoter Internal
Whenever integration occurs, it carries its promoter
sequence with it
This promoter seq. is present within its gene
So no need of any other promoter for gene transcription
and can form several copies
Lipoprotein Receptor gene
Rare example: present as a part of geniome
40 Alu-1 sequence present in intronic sequence
i)SINE-1
41. Long interspersed nuclear elements
Have kpn-1 enzyme restriction site
Size= full length 6.1 kb
Several truncated forms(short forms)= 1.4 kb size
60000 copies in genome
5` UTR ORF1 ORF2 3` UTR
Promoter P40 RT like activity
ORF= Open reading frame: have no stop codon
5` UTR= promoter sequence of ORF1 and ORF2
ii)LINE-1
42. In truncated forms, we have 5` UTR and other
sequences are missing
1-1.4 kb of different size
Only 35000 copies for function
ii)LINE-1
45. Human endogenous retrovirus like elements
Size=6-10 kb
10000 copies in genome
We have 3 different members of 5 classes which are involved in
transposition
Retrotransposition: Transposition which occurs through an RNA
Its members are :
a) Exogenous retrovirus
b) Retrotransposons
c) Processed pesudogenes
v) HERV Family
46. R U5 gog Pol Env U3 R
gog,pol,env are 3 genes of reterovirus
These three genes are involved in different mechanisms
They give rise to several proteins including reverse
transcription
We isolate these and use in cDNA synthesis
U3 and U5 are subterminal repeats
Retrovirus Structure
47. LTR RT LTR
LTR= long terminal repeats
RT= here we have reverse transcriptase encoding
sequence. They show RT like activity as human have no RT
Duplication at the point of integration
DNARNARNA gets integratedat this point
duplication occurs
a) Exogenous reterovirus
AAA
TTT
48. RT
LTR is missing
Have only RT encoding sequence
Also called reterosomes
b) Reterotransposons
AAA
TTT
49. Reteropseudogenes
No RT encoding sequence
RT in this is provided by LINE-1 sequence
c) Processed pseudogenes