Human multi gene families

Repeat Sequences and Human
Multigene families

 Repeated sequences (also known as repetitive elements,
or repeats) are patterns of nucleic acids (DNA or RNA) that
occur in multiple copies throughout the genome.
 35000 genes
 3% coding sequence
(Large no. of repeat sequences)
 60% single copy genes
 40% repeat sequences
1)30% low to moderately repeat sequences
2)10% highly repeat sequences
Repeat Sequences

 Multigene Families
 Genes Encoding RNA
 Pseudogenes
 Extra-genic DNA
Categories of Repeat sequences

 Refers to functional genes present as repeat
sequences(more than 1 copy in genome)
 Has several subcategories
1)Classical Gene families
2)Genes Encoding Domains
3)Genes Encoding Motifs
4)Gene Super-family
Multi-Gene Families

 Show high degree sequence homology throughout
the gene length or at least at the coding sequence
 1st and 2nd copy have same sequences
 3rd copy has only same coding sequence
1)Classical Gene Families

Examples
Ribosomal RNA genes on p arm of Chromosome
13,14,15,20,21 show classical homology
Histone genes on chromosome 1, 6 and 12 show high
degree of sequence homology
1)Classical Gene Families

 Genes not showing sequence homology
 Have variation among copies
 In most cases, encoding large DOMAINS(large
sequences which perform specific functions)
 Examples
1) Paired Box Domain
2) Homeo box Genes

Paired Box Domain
 390 bp
 Encodes a paired domain
 Present in several genes called PAX gene
 Involved in transcription
Homeo Box genes
 180 bp
 Encodes a homeo domain of 60 amino acids

 No sequence homology
 No domain encoding sequence
 Have small motif encoding sequence MOTIF(Small
sequence of DNA, amino acids, nucleotides or
proteins performing specific functions)
 Examples
1) DEAD Box genes
2) WD Gene family
3) Genes Encoding Motifs

DEAD Box Genes
 8 Amino acid motif
D-> Aspartic Acid
E-> Glutamate
A-> Alanine
D-> Aspartic acid
 Responsible for secondary structure of RNA molecule
WD Gene Family
 2 small amino acid motifs but several WD
W->Tryptophan
D->Aspartic Acid
 All are involved in cell division, transcription, mRNA
modifications, cell signalling
3) Genes Encoding Motifs

 No sequence homology
 No domain encoding sequence
 No motif encoding sequence
 Protein encoding genes are structurally and
functionally related to each other
 Examples
1) Immunoglobulin molecules
2) T-cells receptor Genes
4) Gene Super-family

Immunoglobulin Molecules
Domain structure same and involve in immune system
Have three clusters
Present on chromosome 3,14,22
T-cell Receptor Genes
Have 4 clusters
Present on chromosome 7p,7q,14p, 14q
4) Gene Super-family

 Genes are present in:
1)Cluster form
A gene cluster is a group of two or more genes found within
an organism's DNA that encode for similar polypeptides, or
proteins, which collectively share a generalized function and
are often located within a few thousand base pairs of each
other.
Within gene clusters all genes are not fully functional but non
functional copies of genes are also present called as pseudogenes
Arrangement of genes in Multigene
Families

Examples:
 Ribosomal RNA genes present in 5 clusters on chromosome 13,
14,15,20,21. In these clusters only RNA genes are present and no
other gene is there
 Histone Genes present on chromosome 1,6,12. Only histone
genes are present here.
 Hemoglobin genes. Alpha chain genes on chromosome 16 and
beta chain genes on chromosome 11
 Immune system genes on chromosome 6q(more than 200
clusters). These clusters are distributed by different classes of
genes involved in steroid hormone synthesis
 Hox genes have 4 clusters on chromosome 2,7,12 and 17.
1) Cluster form

Highly repetitive DNA sequences spread throughout the
genome. These are usually present on different
chromosomes
Examples:
 Aldolase gene: 5 copies as repeats on different
chromosomes( 4 functional : 1 pseudocopy)
 Active Filament Gene: 20 Copies (only 4 functional: 16
pseudocopies)
 PAX gene(transcription factor gene): 9 copies (1 functional:
8 pseudocopies)
2)Interspersed Form

1)tRNA genes
1300 genes arranged in cluster form
10-100 copies in each cluster
2)rRNA genes
Genes for 28S,18S,5.8S rRNA are present in cluster
form on p arm of chromosome 13,14,15,20,21(all are
acrocentric chromosomes i.e. centromere present
towards one end)
Genes for 5S rRNA on chromosome 1q
Genes Encoding RNA

 Size of ribosomal DNA= 2 MB
 On each of these 5 clusters we have
 on all 5 clusters we have a 27 kb intergenic sequence
between two 13kb transcriptional units
Genes Encoding RNA
13 kb transcriptional unit
27 kb intergenic distance
27 kb intergenic distance

 Towards 5^ end 0f 13kb transcriptional unit, we have:
1) A DNA segment called External transcribed spacer(ETS)
2) 1n 18S rRNA gene
3) Another segment called internal transcribed spacer-1(ITS-1)
4) Then 5.8S rRNA gene, then ITS-2 and then 28S rRNA gene
ETS 18SRNA ITS-1 5.8S RNA ITS-2 28S RNA 27 kb intergenic distance
whole Transcribed into
Multigenic RNA transcript
Genes Encoding RNA

 Multi-genic RNA transcript
 Size is 45S (sedimentation rate coefficient; speed at which particles settle
down in centrifuge; 1S=10^-13sec)
 It undergoes cleavages:
 1st cleavage: removes ETS; 41S is left
 2nd cleavage: slightly towards ITS-1; generates two RNA segments of 20S and
32 S
 3rd cleavage: removes ITS part; generates 18S RNA
 Further Cleavages: ITS-1 & 5.8S; 5.8S & ITS-2; ITS-2 & ID
This is how individual RNA’s are produced from multi-genic RNA transcript.
ETS+ITS-1+ITS-2= 6Kb
 Cleaved by nucleases to generate individual nucleotides
Genes Encoding RNA

 a section of a chromosome that is an imperfect copy
of a functional gene
 No. of pseudogenes vary from chromosome to
chromosome
 Distributed throughout the genome
 Present as cluster or interspersed form
 5 different categories
Pseudogenes

 Same structure as that of functional copy i.e. same
introns, exons and other sequences
 Most pseudogenes belong to this category
Pseudocopy in 4th exon stop codon
Wild type stop codon
Pseudogene premature stop codon stop codon
(makes wild type non functional)
i)Non processed/conventional
pseudogenes

 Both have 3 exons and intron sequences
Wild type stop codon
Pseudogene premature stop codon stop codon
(initially expressing)
 When 2nd copy was generated it was initially
expressing, then it generated a premature stop
codon on exon 2 and became non functional
ii) Expressed Non-processed
pseudogenes

 Example
Alpha globin gene cluster on chromosome 16
It has θ gene which never becomes the part of Hb
gene
Initially it has low expression but then develops a stop
codon and becomes non functional
ii) Expressed Non-processed
pseudogenes

 Processing occurs, Removes intronic sequences and makes
it non functional
 It is present as such in genome but is non-functional
Same process occurs in CDNA synthesis where reteroviruses
convert RNA into CDNA with the help of RNA transcriptase.
CDNA contains only coding sequences and no intronic
sequences are present.
iii) Processed Pseudogenes

 How they are present in humans?
During evolution we have RT-encoding sequence which
have RT like activity and convert mRNA into cDNA and
incorporate it in genome. This cDNA becomes non-
functional
Gene->RNA->mRNA->cDNA->integrated in chromosome
as such->non-functional
iii) Processed Pseudogenes

Processed and integrated in genome; also expressing
because integration occurs at such point which has a
promoter which helps in expression
iv)Expressed Processed Pseudogenes

 Only have 5` and 3` sequences.. Rest of the gene is
missing
Wild
5` end truncated 3` end
 Or we have an exon and some DNA segment and rest
is missing
v) Truncated genes or gene
segments

 By gene duplication
 It has 3 possibilities
1)By recombination between non-allelic sequences among sister chromatids
2)By recombination between non allelic sequences among non sister chromatids
3) Sleeping mechanism
Recombination b/w either sister or non sister chromatids involves repeat sequences. This
is called sleeping mechanism
e.g. CAG CAG CAG
GTC GTC GTC
(In meiosis recombination occurs b/w allelic sequences while in gene duplication b/w non
allelic sequences)
How are pseudocopies generated?

 Repeat sequences which are not part of any genome
1) Tandomly Repeat DNA
2) Interspersed Repeats
Extragenic DNA

 Present next to each other
 In blocks or arrays
 Depending on the block size which contains repeat
sequences, it can be divided into further 3 categories
i)Satellite DNA
ii)Mini satellite DNA
iii)Micro satellite DNA
1) Tandomly Repeat DNA

 Size of block varies from 100 kb to several megabase
 Further classified into:
a) Satellite 1:
 48 kb
 Located near centromere heterochromatic(inactive) region
 AT rich sequence
b) Satellite 2 and 3:
 5 bp repeats ATTCC
 Present throughout chromosome
i) Satellite DNA

c) α satellite(α-alphoid)
171 bp repeats
Located at centromere heterochromatic region
d) β-satellite
68 bp
Present on centromere of chromosome
1,9,13,14,15,20,21
i) Satellite DNA

 Size of block 0.1 kb to 20 kb
 Polymorphic as well
 Further divided into 2 categories:
a)Hypervariable mini satellite sequence
 Located on sub-telomeric region
 Present on all chromosomes
 By comparing HVMSS on different chromosomes, we
observe sequence variation
 Core sequence GGGCAGGAXG same on all chromosomes

b) Hexa-nucleotide Sequences TTAGGG:
6 bp repeats
Present on telomeric region of all chromosomes
10-15 kb region of telomere has this sequence
Telomerase RNA + Protein -> functions as template
This sequence is added at the end of chromosome by
telomerase

 Size of block 50bp-500bp
 Present throughout the chromosome
 Have several sub categories
 Runs of A’s : AAAA--- Repeat sequences
 Runs of T’s : TTTT---- Repeat sequences
(about 0.3% of human genome contains such sequences)
 Similarly, GGGG---repeats and CCCC---repeats are also present but
they are very rare
 Dinucleotide repeat: CA repeat (0.5% of human genome)
 CT/AG Repeats : (0.2% of human genome)
 Trinucleotide repeats
 Tetra-nucleotide repeats
iii) Micro Satellite Sequences

 All repeats are highly polymorphic i.e no. of repeats varies
e.g. on chromosome 1q
Individual 1: 100 CA
Individual 2: 102 CA
Individual 3: 98 CA
 Or within the same individual they are also polymorphic (
one allele=101 repeats; other allele= 105 repeats)
 Tri and tetra nucleotides are more polymorphic than others
 These repeats are used in DNA fingerprinting
iii) Micro Satellite Sequences

 Extragenic repeats distributed throughout the
genome
 Subcategories are:
1) SINE-1(AluD)
2) LINE-1(Kpn-1)
3) MER family
4) THE-1
5) HERV family
2) Interspersed Repeats

 Shorter interspersed nuclear elements
 Alu D restriction sites: after every 4-6 kb we have Au-1
sequence
 Size=280 bp
 More than 1 million copies
 Most abundant repeat sequences
 Have 2 units(120 bp repeats)
 32 bp additional sequence in 2nd monomer of 120 bp
sequence which is missing in 1st monomer
120 bp 120 bp 32 bp integrated seq
i)SINE-1
AAA
TTT
AAAA
TTTT

7SLRNA Sequence
SRP(signal recognition particle)= 7SLRNA(size is 300
nucleotides) + 6 proteins  help in transport of
proteins
Alu-1 sequence is very similar to 7SLRNA; probably
Alu-1 is considered to be derived from 7SLRNA
through transposition
i)SINE-1

Promoter Internal
 Whenever integration occurs, it carries its promoter
sequence with it
 This promoter seq. is present within its gene
 So no need of any other promoter for gene transcription
and can form several copies
Lipoprotein Receptor gene
 Rare example: present as a part of geniome
 40 Alu-1 sequence present in intronic sequence
i)SINE-1

 Long interspersed nuclear elements
 Have kpn-1 enzyme restriction site
 Size= full length 6.1 kb
 Several truncated forms(short forms)= 1.4 kb size
 60000 copies in genome
5` UTR ORF1 ORF2 3` UTR
Promoter P40 RT like activity
 ORF= Open reading frame: have no stop codon
 5` UTR= promoter sequence of ORF1 and ORF2
ii)LINE-1

 In truncated forms, we have 5` UTR and other
sequences are missing
 1-1.4 kb of different size
 Only 35000 copies for function
ii)LINE-1

 Medium reiteration
 Size is few hundred bp
iii) MER Family

 Transposable human element
 Size= 2-3 kb
iv) THE-1

 Human endogenous retrovirus like elements
 Size=6-10 kb
 We have 3 different members of 5 classes which are involved in
transposition
 Retrotransposition: Transposition which occurs through an RNA
 Its members are :
a) Exogenous retrovirus
b) Retrotransposons
c) Processed pesudogenes
v) HERV Family

R U5 gog Pol Env U3 R
 gog,pol,env are 3 genes of reterovirus
 These three genes are involved in different mechanisms
 They give rise to several proteins including reverse
transcription
 We isolate these and use in cDNA synthesis
 U3 and U5 are subterminal repeats
Retrovirus Structure

LTR RT LTR
 LTR= long terminal repeats
 RT= here we have reverse transcriptase encoding
sequence. They show RT like activity as human have no RT
 Duplication at the point of integration
 DNARNARNA gets integratedat this point
duplication occurs
a) Exogenous reterovirus
AAA
TTT

 RT
 LTR is missing
 Have only RT encoding sequence
 Also called reterosomes
b) Reterotransposons
AAA
TTT

 Reteropseudogenes
 No RT encoding sequence
 RT in this is provided by LINE-1 sequence
c) Processed pseudogenes

Human multi gene families

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Human multi gene families

Similar to Human multi gene families (20)

Recently uploaded

Recently uploaded (20)

Human multi gene families