3. Retrieve sequence
information from GenBank
• Энэжишээндфилогенетик модбайгуулахдаа ХүнийболонSimian
immunodeficiencyvirus–ийн17өөрөөргинжнээс 3 кодоныурттайдараалал
авсан.
Эдгээрдарааллуудыг ньтэдгээрийнхандалтындугаараарньГенбанкаас
татажавсан.
Биднийсонирхсон3кодныбүснь
• gagprotein(Group-specific antigen Gag proteins are encoded by
the gag gene, and provide structural elements of the virus)
• Pol polyprotein( pol генийгкодлодог.)
• envelope polyprotein precursor (вирус ) юм.
• Эдгээр дараалал ньГенбанкаас CDS ийн мэдээллээр дамжиж
ирнэ. ( CDS is a software package for generalised Information Storage
and Retrieval systems developed)
7. • Хэрвээ интернет холболт байхгүй бол та энэ коммандыг
ашиглаж өгөгдлөө хэрэглэх боломжтой.
load hivdemodata % <== Uncomment this if no Internet
connection
8. Extract CDS for the GAG, POL, and ENV coding regions. Then extract
the nucleotide sequences using the CDS pointers
For ind = 1:numViruses
temp_seq = seqs_hiv(ind).Sequence;
temp_seq = regexprep(temp_seq,'[nry]','a');
CDSs = seqs_hiv(ind).CDS(data{ind,3});
gag(ind).Sequence = temp_seq(CDSs(1).indices(1):CDSs(1).indices(2));
pol(ind).Sequence = temp_seq(CDSs(2).indices(1):CDSs(2).indices(2));
env(ind).Sequence = temp_seq(CDSs(3).indices(1):CDSs(3).indices(2));
end
13. % Convert nucleotide sequences to amino acid sequences using *nt2aa*.
for ind = 1:numViruses
aagag(ind).Sequence = nt2aa(gag(ind).Sequence);
aapol(ind).Sequence = nt2aa(pol(ind).Sequence);
aaenv(ind).Sequence = nt2aa(env(ind).Sequence);
end
% Calculate the distance and linkage, and then generate the tree.
pold = seqpdist(aapol,'method','Jukes-Cantor','indel','pair');
poltree = seqlinkage(pold,'WPGMA',data(:,1))
plot(poltree,'type','angular');
title('Immunodeficieny virus (POL polyprotein)')
18. weights = [sum(gagd) sum(pold) sum(envd)];
weights = weights / sum(weights);
dist = gagd .* weights(1) + pold .* weights(2) + envd .* weights(3);
% Note that different metrics were used in the calculation of the pairwise
% distances. This could bias the consensus tree. You may wish to
% recalculate the distances for the three regions using the same metric to
% get an unbiased tree.
tree_hiv = seqlinkage(dist,'average',data(:,1));
plot(tree_hiv,'type','angular');
title('Immunodeficieny virus (Weighted tree)')
21. % Add annotations
annotation1 = annotation(gcf,'textarrow',[0.2875 0.3089],[0.681 0.7571],...
'Color',[1 0.5 0],'String',{'Possible HIV type 1 origin'},...
'TextColor',[1 0.5 0]);
annotation(gcf,'textarrow',[0.4196 0.4893],[0.5929 0.5405],...
'Color',[1 0 0],'String',{'HIV type 2 origin'},'TextColor',[1 0 0]);
22. References:
• [1] "Origin of HIV-1 in the chimpanzee Pan troglodytes troglodytes" Nature
397(6718), 436-41 (1999)
• [2] "Comparison of simian immunodeficiency virus isolates" Nature
331(6157), 619-622 (1988)
• [3] "Genetic variability of the AIDS virus: nucleotide sequence analysis of
two isolates from African patients" Cell 46 (1), 63-74 (1986)
Copyright 2003-2005 The MathWorks, Inc.
Published with MATLAB® 7.