2. IntroductionIntroduction
►Chimpanzee Genome was downloadedChimpanzee Genome was downloaded
from Ensemble database.from Ensemble database.
►Chimp genome was mined for Long terminalChimp genome was mined for Long terminal
retrotransposons using a data miningretrotransposons using a data mining
program, LTR_STRUC,in conjunction withprogram, LTR_STRUC,in conjunction with
conventional techniques.conventional techniques.
3. Flow chart For Identification Of LTR
Retrotransposons
Genome Sequence
Structure based
element prediction
e.g. LTR_STRUC
Element Set
Similarity
searches
e.g. BLAST
Exhaustive similarity
Searches
e.g. BLAST
Sequence
Analysis
e.g. ClustalX
Phylogenetic
Analysis
e.g. MEGA
Families
Genome
Mapping
e.g. BLAT
Gene-element
Associations
Multiple
Alignments
e.g. ClustalW
Consensus
e.g. Consensus
Specialized Databases
e.g. Repbase
General Databases
e.g. NCBI1
3
2
FB
1
2
3
Fig 2: Workflow for identification of LTR- Retrotransposons in a
genome
Rectangles represent data.Diamonds represent steps involving the
usage of bioinformatics tools.Solid arrows represent the general flow
of information.Gray arrow represent the refinements.Black arrows
indicate the alternate flow of information. Numbers indicate the order
of steps. FB: Feed Back
4. LTR-STRUCLTR-STRUC
► Searches for the presence of Long terminalSearches for the presence of Long terminal
repeats on either side of element with in a certainrepeats on either side of element with in a certain
range of base pairs.range of base pairs.
► If the LTR’s are found then it searches for theIf the LTR’s are found then it searches for the
presence of other characteristic features of LTRpresence of other characteristic features of LTR
retrotransposons b/w the two putative LTR’s.retrotransposons b/w the two putative LTR’s.
► Assigns a score from 0 – 2.0 to the hits dependingAssigns a score from 0 – 2.0 to the hits depending
on the presence of the characteristic features.on the presence of the characteristic features.
► Reports all the hits above the score of 0.3Reports all the hits above the score of 0.3
5. LTR-STRUC on ChimpLTR-STRUC on Chimp
GenomeGenome
•Total number of hits above score of 0.3 :
2056
2056 (LTR identity: 43.5 –99.6 %)
With out RT With RT
1959 97 (LTR identity: 71.4 – 99.4%)
Score < 0.7 Score > 0.7
42 55 (LTR identity 71.4 – 99.4 %)
32 23 (LTR identity:
RT conserved motifs presentRT conserved motifs absent
•No correlation between LTR identity and score
6. Identification Of RT encoding ORFIdentification Of RT encoding ORF
• 23 RTs identified are subjected to sequence analysis to determine
the RT encoding Open Reading Frame (ORF) from the 3 ORFs
given by LTR_STRUC
Briefly sequence analysis involves:
• The amino acid sequences from the three open reading frames of
the RT are aligned with previously annotated RTs using ClustalX.
• They are checked for the presence of conserved RT domains as
described by Eickbush et al to determine the ORF encoding RT.
• The ClustalX predicted RT encoding ORF is further
subjected to BLASTp searches against NCBI non redundant
database using default parameters to confirm the prediction.
7. Phylogenetic Analysis Of The Initial RT’s
ERV9
HERV9
Chimp1
Chimp2
HERV30
Chimp3
HervW
HER
V17
HERV
P
HervF
HER
VFH19
HervH
HervF
b
HERVH48
HERVXAHERVFH21HervZHerv FRD
HervHS49c23
HERV R Tybe bChimp4
HERVE
Chimp5
Chimp6
HervR
HERV3
RRHervI
Herv.s71
FELV
baboon end
GALV
Phasco
M
ulVPERVMDEV
Chimp7Chimp8HervADP
Chimp9
HervIChimp10
Chimp11
HERVIP10F
GYPSY
humanfoam
BLV
SIV
HIV
RSV
Chimp12
HBCA
mmtv(2)SRV-1
GH-G18
RERV
Chim
p13
C
him
p14
Chim
p15
HERV
HM
L5
HERVK22I
Chimp16
Herv HML6
HERVK3I
Chimp17
Chimp18
Chimp19
Chimp20HERVK9I
Chimp21
Chimp22Chimp23
Chimp24
Chimp25
HERVK11I
Chim
p26
HERVK11DI
HERVK13I
HERVK14I
HERVK14CI
HERVKC4
Chimp27
HervK
HervK(2)HERVK(3)
HervS
HervL
HERV16
Chimp28
Chimp29
0.1
Class 1
Class 2
Class 3
May be Chimp specific
May be Chimp specific