SlideShare a Scribd company logo
Re-construction of Phylogenetic tree
using maximum-likelihood methods
PhyML (in nutshell)
Note: Slides are still under revision
Steps
• Collect homologous sequences.
• Multiple sequence alignment.
• Manually Curing of the multiple sequence alignment.
• Feeding the MSA to programs to study the substitution
rates in between locations of the sites in the MSA.
(ProtTest for protein and jModeltest for DNA alignments).
• Selecting an appropriate substitution model.
• Feeding the MSA, starting tree (e.g., those obtained with
Neighbour-joining method) and substitution model as well
as bootstrap properties to PhyML.
• Obtain tree and cross-check bootstrap values, branch
length and general resolution.
• Remove rouge taxons and redo the entire process till
satisfactory tree is constructed.
Selection of sequences for phylogenetic tree
Purpose of the tree
1.Geneology: evolution of gene/ gene family irrespective of
speciation (called gene tree).
2.Phenology: evolution of gene/gene family in context of
phylogenetic speciation (called species tree).
Homologues: Genes derived from common ancestors.
Orthologues: Genes derived from common ancestors or
homologues that are separated from each other by
gene/genome duplication (of course before speciation).
Paralogues: Genes derived from common ancestors or
homologues that are separated from one another by
speciation (i.e., after speciation occurs the same copy of gene
evolves under different constraints that are face by the two
different species.
Selecting sequences
•Similar sequence of considerably low e-value in BLAST in
general can be assigned to be homologous.
•<40% amino acid similarity = higher by-chance appearance of
similarity and not necessarily a similairity due to homology
•~40% amino acid similarity= twilight zone for homology (may
be may not be)
•≥60% amino acid similarity=homology inferred
(~80% or higher similarity in DNA sequence.)
• Perform BLAST of the new sequence.
• Note the hits obtained and the e-value.
• Follow the sequences down the list with increasing e-values till the e-
value suddenly jumps in order of 3 or so. E.g. 1e-10 means that the
possibility that the sequence similarity is having a by-chance occurance is
in probablity of 1x 10-10
and not due to homology. A sudden jump from 1e-
10 to 1 e-5 in the similarity sequence BLAST result list may indicate that
the homology may be limited till the sequences with lower e-value.
(Note: e-value is subjected to the size of the sequence database. larger
database have lower starting e-values for a given query sequence)
• Note the annotation or characterization of the proteins encoded as well
as the % similarity and sequence coverage.
• Also note the organisms from which it is derived
• Select sequences with considerable coverage and similarity for multiple
sequence alignment.
• The choice of sequence can be based on species of origin and their
relatedness or on special activities and multiple domain structures
depending on what basis the phylogeny is to be re-constructed.
MSA- Multiuple Sequence Alignment
Different types eg., CLUSTAL, DiALIGN, MUSCLE, MAFFT.
THEORETICALY ANY SEQUENCE CAN BE ALIGNED TO ANY OTHER SEQUENCE>
WHETHER IT MAKES SENSE OR NOT IS A DIFFERENT ISSUE.
CLUSTAL (CLUSTALW2, X): ClustalW2 uses a dynamic programing method to make
MSA based on Hidden-Markov models (HMM) of probalistic likelihoods of all gaps,
matches and mismatches to be aligned into a biologically relevant MSA. The dynamic
programing stepwise finds the highest score of MSA based on cumulative scores by
matches at each base and penalizing scores due to mismatches. This stepwise scoring
is decided in first a pairwise matrix choosing the shortest distance to higher scores in
situations where gaps are observed. (more info on internet will be available). This
reduces greatly the time required for analysis.
DiALIGN: Dialign which does not use gap penalizing and thus can be used for more
accurate alignment of very divergent sequences that suffer large alignment gaps.
MUSCLE: MUSCLE (Multiple Sequence Alignment by Log-Expectation) rely on
interative methods that involve repeatedly aligning the old sequences while adding
newer to the growing MSA to produce more accurate alignments in shorter time
frames.
CLUSTAL (CLUSTALX):
•Feed sequence in fasta format (copy paste on the applet or attach a
notepad file {*.txt}).
E.g., > (name of the 1st
sequence)
Agtgatagatag…………
>(name of the 2nd
sequence)
Gatagatcgctgatcgctc…..
•Run with default.
•Analyze
Gaps are frequent: change the settings such that gap
opening penalty is high e.g. increase from the default value
of 10 to 15, 20, 25, 30.
Gaps are long but less frequent: change settings such that
gap extension penalty is high e.g., increase from default
value of 1 to 2,3,4,5
No gaps but many mismatches: relax the gap opening (5,
6, 7,) and/or gap extension penalty (0.1, 0.2, 0.4, 0.5) such
that indels might occur in the data set for a better match.
REDO THE MSA ALIGNMENT TILL IT IS better.
Manual curing of MSA
•Involves intellectual curing of usually the placement of alignment gaps
among the sequence alignment. This is understood more appropriately in
case to case study.
•Involves the removal of rouge taxons. i.e., the sequence that do not fit in
the current MSA due to dis-proportionate accurence of mismatches and
gaps. Usually it can be figured out after the first tree is made and the
bootstrapping values and/or branch lengths of the particular lineages is
questionable. (appropriate software are available).
•Larger the sequence set the higher the accuracy of the tree. But also more
time consuming is tree construction by maximum likelihood (ML).
•More diverse the sequence set more erroneous the tree may be since it
would be an approximation. Hence closely similar sequences
representatives from each ordered data set needs to be selected. For eg.,
when talking of small molecule methyl transferases one may take a few
close relatives of O-, N-, C- methyl transferases for analysis since these
have considerable phylogenetic homology.
Substitution model
•The curated MSA can be included as an input to programs like jModeltest for DNA and
Prottest for proteins to the pattern of substitution at each site in the MSA. Based on this
pattern a list of appropriate substitution model for anaylsis is calculated. For eg. The
simplest model Jukes-Cantor (JC) says that each base of DNA can be substituted at equal
rate to other base in evolution. Though it is unrealistic in the practicality of life but the
sequences selected might just anticipated to be obliging to this rate and thus JC can be
used for analysis in PhyML. Kimura model says that transitions (Ts) (or purine to purine and
pyrimidine to pyrimidine changes) and transversions (Tv) (purine to pyrimidine or vice
versa) changes occur at different rates.
•There are 22 DNA substitution models published and each model can have slight variants
based on statistical distribution of variables like +I + G and +Y thus making it a total of
22*4=88 substitution model for DNA substitution.
•+I: refers to proportion of invariable sites. (invariable sites refers to the bias incorporated
due to substitution and rate heterogeneity amongst different lineages).
Inclusion of this parameter ensures that the bias of sequence dissimilarity due to sequence
relatedness id reduced.
•+g: refers to gamma distribution of the matrix (gamma distribution is a pattern/shaape
that is obserevd during statistical distribution of variants).
•+y: refers to distribution or accounting for Ts/Tv ratio (incorporated due to slight
variations observed between transition and transversion substitutions).
e.g., MSA can follow a JC model or JC+I or JC+G or JC+Y
Substitution model
•The decision of what substitution model depends on three sattistical
considerations incorporated in both jModeltest and prottest. Akaike
Information Criteria (AIC), Bayesian Information criteria (BIC) and Akaike
Information Criteria corrected for small samples (AICc).
•The model having high scores for AIC and BIC are usually selected as
appropriate substitution models for phylogenetic estimation.
Phylogeny
PhyML at present incorporates analysis using 32 substitution models for
DNA.
After adding all the tested parameters like MSA, substitution models, + I/
+G/+Y parameter options the tree building can be carried out.
PhyML requires a strating user-define tree for building a phlylogenetic tree.
If not available PhyML can be commanded to construct by its own a
Neighbour-Joining starting tree.
The tree can be improved by selecting option like SPR +NNI so that
appropriateness in branch lengths can be incorporated.
Finally a bootstrapping for 1000 pseudoreplicates is choosen for accuracy
of branch topology.
Bootstrapping
Bootstrapping involves the program to perform the same
tree building with pseudoreplicates of the sequences
after breaking blocks of alignment and rearranging and
then calculation how many times per hundred
pseudoreplicates does a branch fall under the same
topology.
A bootstrap of greated than 70% is significant in general.
Higher amount of pseudoreplicates chooses the more
accurate is the topological calculations
A bootstrap pesudoreplicate of 1000 is preferable but in
consideration of time required pseudoreplicate of 100
also suffices.
Re-construction
•Once the tree is generated, the tree is broadly looked upon for
accuracy by bootstrap values of each branch as well as disproportionate
branch lengths.
•In case of faulty trees, corrections need to be made at both aspects.
•If the MSA is cured properly, then one might need to remove rogue
taxons (Taxons that are problematic to the tree topology or branch
length) using available softwares.
The entire process from searching for optimal substitution models
may needed to be repeated.
•If no rogue taxons can be identified. Reducing the generality of
sequence diversity could also be tried. And more relevant sequences
only be included in MSA.
•The NJ tree option can also be changed to a user defined tree option.
•The tree construction is repeated in a number of cycles untill
appropriate tree is generated.

More Related Content

What's hot

Phylogenetic relationships- Homology; Homologous sequences of proteins and D...
Phylogenetic relationships- Homology; Homologous sequences of proteins  and D...Phylogenetic relationships- Homology; Homologous sequences of proteins  and D...
Phylogenetic relationships- Homology; Homologous sequences of proteins and D...
Merin Tess Zacharias
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
Subhranil Bhattacharjee
 
Prosite
PrositeProsite
222397 lecture 16 17
222397 lecture 16 17222397 lecture 16 17
222397 lecture 16 17
mohamedseyam13
 
UPGMA
UPGMAUPGMA
BLAST
BLASTBLAST
Histone Modification: Acetylation n Methylation
Histone Modification: Acetylation n MethylationHistone Modification: Acetylation n Methylation
Histone Modification: Acetylation n MethylationSomanna AN
 
multiple sequence alignment
multiple sequence alignmentmultiple sequence alignment
multiple sequence alignment
harshita agarwal
 
Physical maps and their use in annotations
Physical maps and their use in annotationsPhysical maps and their use in annotations
Physical maps and their use in annotations
Sheetal Mehla
 
Phylogenetics
PhylogeneticsPhylogenetics
Phylogenetics
Afnan Zuiter
 
Genome Mapping
Genome MappingGenome Mapping
Genome Mapping
ruchibioinfo
 
Phylogenetic prediction - maximum parsimony method
Phylogenetic prediction - maximum parsimony methodPhylogenetic prediction - maximum parsimony method
Phylogenetic prediction - maximum parsimony method
Afnan Zuiter
 
Bls 303 l1.phylogenetics
Bls 303 l1.phylogeneticsBls 303 l1.phylogenetics
Bls 303 l1.phylogeneticsBruno Mmassy
 
Multiple Sequence Alignment by Shubham Kaushik
Multiple Sequence Alignment by Shubham KaushikMultiple Sequence Alignment by Shubham Kaushik
Multiple Sequence Alignment by Shubham Kaushik
Shubham Kaushik
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In Bioinformatics
Nikesh Narayanan
 
Orthologs,Paralogs & Xenologs
 Orthologs,Paralogs & Xenologs  Orthologs,Paralogs & Xenologs
Orthologs,Paralogs & Xenologs
OsamaZafar16
 
Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis
Nitin Naik
 
Principle and workflow of whole genome bisulfite sequencing
Principle and workflow of whole genome bisulfite sequencingPrinciple and workflow of whole genome bisulfite sequencing
Principle and workflow of whole genome bisulfite sequencing
sciencelearning123
 

What's hot (20)

Phylogenetic relationships- Homology; Homologous sequences of proteins and D...
Phylogenetic relationships- Homology; Homologous sequences of proteins  and D...Phylogenetic relationships- Homology; Homologous sequences of proteins  and D...
Phylogenetic relationships- Homology; Homologous sequences of proteins and D...
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Prosite
PrositeProsite
Prosite
 
222397 lecture 16 17
222397 lecture 16 17222397 lecture 16 17
222397 lecture 16 17
 
UPGMA
UPGMAUPGMA
UPGMA
 
BLAST
BLASTBLAST
BLAST
 
Histone Modification: Acetylation n Methylation
Histone Modification: Acetylation n MethylationHistone Modification: Acetylation n Methylation
Histone Modification: Acetylation n Methylation
 
multiple sequence alignment
multiple sequence alignmentmultiple sequence alignment
multiple sequence alignment
 
Physical maps and their use in annotations
Physical maps and their use in annotationsPhysical maps and their use in annotations
Physical maps and their use in annotations
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
Phylogenetics
PhylogeneticsPhylogenetics
Phylogenetics
 
Genome Mapping
Genome MappingGenome Mapping
Genome Mapping
 
Phylogenetic prediction - maximum parsimony method
Phylogenetic prediction - maximum parsimony methodPhylogenetic prediction - maximum parsimony method
Phylogenetic prediction - maximum parsimony method
 
Bls 303 l1.phylogenetics
Bls 303 l1.phylogeneticsBls 303 l1.phylogenetics
Bls 303 l1.phylogenetics
 
Multiple Sequence Alignment by Shubham Kaushik
Multiple Sequence Alignment by Shubham KaushikMultiple Sequence Alignment by Shubham Kaushik
Multiple Sequence Alignment by Shubham Kaushik
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In Bioinformatics
 
Orthologs,Paralogs & Xenologs
 Orthologs,Paralogs & Xenologs  Orthologs,Paralogs & Xenologs
Orthologs,Paralogs & Xenologs
 
Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis
 
BLAST
BLASTBLAST
BLAST
 
Principle and workflow of whole genome bisulfite sequencing
Principle and workflow of whole genome bisulfite sequencingPrinciple and workflow of whole genome bisulfite sequencing
Principle and workflow of whole genome bisulfite sequencing
 

Similar to Phylogenetic analysis in nutshell

Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
journal ijrtem
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
IJRTEMJOURNAL
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
H K Yoon
 
4. sequence alignment.pptx
4. sequence alignment.pptx4. sequence alignment.pptx
4. sequence alignment.pptx
ArupKhakhlari1
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
Afra Fathima
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
Zeeshan Hanjra
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptx
Ranjan Jyoti Sarma
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blast
Rai University
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastRai University
 
Softwares For Phylogentic Analysis
Softwares For Phylogentic AnalysisSoftwares For Phylogentic Analysis
Softwares For Phylogentic Analysis
Prasanthperceptron
 
BTC 506 Phylogenetic Analysis.pptx
BTC 506 Phylogenetic Analysis.pptxBTC 506 Phylogenetic Analysis.pptx
BTC 506 Phylogenetic Analysis.pptx
ChijiokeNsofor
 
Perl for Phyloinformatics
Perl for PhyloinformaticsPerl for Phyloinformatics
Perl for Phyloinformatics
Rutger Vos
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
Yaoyu Wang
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
mikaelhuss
 
phy prAC.pptx
phy prAC.pptxphy prAC.pptx
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...
IJCSEIT Journal
 
Introduction to sequence alignment partii
Introduction to sequence alignment partiiIntroduction to sequence alignment partii
Introduction to sequence alignment partii
SumatiHajela
 
Bioinformatica t4-alignments
Bioinformatica t4-alignmentsBioinformatica t4-alignments
Bioinformatica t4-alignments
Prof. Wim Van Criekinge
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
ammar kareem
 

Similar to Phylogenetic analysis in nutshell (20)

Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
 
4. sequence alignment.pptx
4. sequence alignment.pptx4. sequence alignment.pptx
4. sequence alignment.pptx
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptx
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blast
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blast
 
Softwares For Phylogentic Analysis
Softwares For Phylogentic AnalysisSoftwares For Phylogentic Analysis
Softwares For Phylogentic Analysis
 
BTC 506 Phylogenetic Analysis.pptx
BTC 506 Phylogenetic Analysis.pptxBTC 506 Phylogenetic Analysis.pptx
BTC 506 Phylogenetic Analysis.pptx
 
Perl for Phyloinformatics
Perl for PhyloinformaticsPerl for Phyloinformatics
Perl for Phyloinformatics
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
phy prAC.pptx
phy prAC.pptxphy prAC.pptx
phy prAC.pptx
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...
 
Introduction to sequence alignment partii
Introduction to sequence alignment partiiIntroduction to sequence alignment partii
Introduction to sequence alignment partii
 
Bioinformatica t4-alignments
Bioinformatica t4-alignmentsBioinformatica t4-alignments
Bioinformatica t4-alignments
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 

Recently uploaded

Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
muralinath2
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
Cherry
 
justice-and-fairness-ethics with example
justice-and-fairness-ethics with examplejustice-and-fairness-ethics with example
justice-and-fairness-ethics with example
azzyixes
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
Large scale production of streptomycin.pptx
Large scale production of streptomycin.pptxLarge scale production of streptomycin.pptx
Large scale production of streptomycin.pptx
Cherry
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 

Recently uploaded (20)

Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
 
justice-and-fairness-ethics with example
justice-and-fairness-ethics with examplejustice-and-fairness-ethics with example
justice-and-fairness-ethics with example
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
Large scale production of streptomycin.pptx
Large scale production of streptomycin.pptxLarge scale production of streptomycin.pptx
Large scale production of streptomycin.pptx
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 

Phylogenetic analysis in nutshell

  • 1. Re-construction of Phylogenetic tree using maximum-likelihood methods PhyML (in nutshell) Note: Slides are still under revision
  • 2. Steps • Collect homologous sequences. • Multiple sequence alignment. • Manually Curing of the multiple sequence alignment. • Feeding the MSA to programs to study the substitution rates in between locations of the sites in the MSA. (ProtTest for protein and jModeltest for DNA alignments). • Selecting an appropriate substitution model. • Feeding the MSA, starting tree (e.g., those obtained with Neighbour-joining method) and substitution model as well as bootstrap properties to PhyML. • Obtain tree and cross-check bootstrap values, branch length and general resolution. • Remove rouge taxons and redo the entire process till satisfactory tree is constructed.
  • 3. Selection of sequences for phylogenetic tree Purpose of the tree 1.Geneology: evolution of gene/ gene family irrespective of speciation (called gene tree). 2.Phenology: evolution of gene/gene family in context of phylogenetic speciation (called species tree). Homologues: Genes derived from common ancestors. Orthologues: Genes derived from common ancestors or homologues that are separated from each other by gene/genome duplication (of course before speciation). Paralogues: Genes derived from common ancestors or homologues that are separated from one another by speciation (i.e., after speciation occurs the same copy of gene evolves under different constraints that are face by the two different species.
  • 4. Selecting sequences •Similar sequence of considerably low e-value in BLAST in general can be assigned to be homologous. •<40% amino acid similarity = higher by-chance appearance of similarity and not necessarily a similairity due to homology •~40% amino acid similarity= twilight zone for homology (may be may not be) •≥60% amino acid similarity=homology inferred (~80% or higher similarity in DNA sequence.)
  • 5. • Perform BLAST of the new sequence. • Note the hits obtained and the e-value. • Follow the sequences down the list with increasing e-values till the e- value suddenly jumps in order of 3 or so. E.g. 1e-10 means that the possibility that the sequence similarity is having a by-chance occurance is in probablity of 1x 10-10 and not due to homology. A sudden jump from 1e- 10 to 1 e-5 in the similarity sequence BLAST result list may indicate that the homology may be limited till the sequences with lower e-value. (Note: e-value is subjected to the size of the sequence database. larger database have lower starting e-values for a given query sequence) • Note the annotation or characterization of the proteins encoded as well as the % similarity and sequence coverage. • Also note the organisms from which it is derived • Select sequences with considerable coverage and similarity for multiple sequence alignment. • The choice of sequence can be based on species of origin and their relatedness or on special activities and multiple domain structures depending on what basis the phylogeny is to be re-constructed.
  • 6. MSA- Multiuple Sequence Alignment Different types eg., CLUSTAL, DiALIGN, MUSCLE, MAFFT. THEORETICALY ANY SEQUENCE CAN BE ALIGNED TO ANY OTHER SEQUENCE> WHETHER IT MAKES SENSE OR NOT IS A DIFFERENT ISSUE. CLUSTAL (CLUSTALW2, X): ClustalW2 uses a dynamic programing method to make MSA based on Hidden-Markov models (HMM) of probalistic likelihoods of all gaps, matches and mismatches to be aligned into a biologically relevant MSA. The dynamic programing stepwise finds the highest score of MSA based on cumulative scores by matches at each base and penalizing scores due to mismatches. This stepwise scoring is decided in first a pairwise matrix choosing the shortest distance to higher scores in situations where gaps are observed. (more info on internet will be available). This reduces greatly the time required for analysis. DiALIGN: Dialign which does not use gap penalizing and thus can be used for more accurate alignment of very divergent sequences that suffer large alignment gaps. MUSCLE: MUSCLE (Multiple Sequence Alignment by Log-Expectation) rely on interative methods that involve repeatedly aligning the old sequences while adding newer to the growing MSA to produce more accurate alignments in shorter time frames.
  • 7. CLUSTAL (CLUSTALX): •Feed sequence in fasta format (copy paste on the applet or attach a notepad file {*.txt}). E.g., > (name of the 1st sequence) Agtgatagatag………… >(name of the 2nd sequence) Gatagatcgctgatcgctc….. •Run with default. •Analyze Gaps are frequent: change the settings such that gap opening penalty is high e.g. increase from the default value of 10 to 15, 20, 25, 30. Gaps are long but less frequent: change settings such that gap extension penalty is high e.g., increase from default value of 1 to 2,3,4,5 No gaps but many mismatches: relax the gap opening (5, 6, 7,) and/or gap extension penalty (0.1, 0.2, 0.4, 0.5) such that indels might occur in the data set for a better match. REDO THE MSA ALIGNMENT TILL IT IS better.
  • 8. Manual curing of MSA •Involves intellectual curing of usually the placement of alignment gaps among the sequence alignment. This is understood more appropriately in case to case study. •Involves the removal of rouge taxons. i.e., the sequence that do not fit in the current MSA due to dis-proportionate accurence of mismatches and gaps. Usually it can be figured out after the first tree is made and the bootstrapping values and/or branch lengths of the particular lineages is questionable. (appropriate software are available). •Larger the sequence set the higher the accuracy of the tree. But also more time consuming is tree construction by maximum likelihood (ML). •More diverse the sequence set more erroneous the tree may be since it would be an approximation. Hence closely similar sequences representatives from each ordered data set needs to be selected. For eg., when talking of small molecule methyl transferases one may take a few close relatives of O-, N-, C- methyl transferases for analysis since these have considerable phylogenetic homology.
  • 9. Substitution model •The curated MSA can be included as an input to programs like jModeltest for DNA and Prottest for proteins to the pattern of substitution at each site in the MSA. Based on this pattern a list of appropriate substitution model for anaylsis is calculated. For eg. The simplest model Jukes-Cantor (JC) says that each base of DNA can be substituted at equal rate to other base in evolution. Though it is unrealistic in the practicality of life but the sequences selected might just anticipated to be obliging to this rate and thus JC can be used for analysis in PhyML. Kimura model says that transitions (Ts) (or purine to purine and pyrimidine to pyrimidine changes) and transversions (Tv) (purine to pyrimidine or vice versa) changes occur at different rates. •There are 22 DNA substitution models published and each model can have slight variants based on statistical distribution of variables like +I + G and +Y thus making it a total of 22*4=88 substitution model for DNA substitution. •+I: refers to proportion of invariable sites. (invariable sites refers to the bias incorporated due to substitution and rate heterogeneity amongst different lineages). Inclusion of this parameter ensures that the bias of sequence dissimilarity due to sequence relatedness id reduced. •+g: refers to gamma distribution of the matrix (gamma distribution is a pattern/shaape that is obserevd during statistical distribution of variants). •+y: refers to distribution or accounting for Ts/Tv ratio (incorporated due to slight variations observed between transition and transversion substitutions). e.g., MSA can follow a JC model or JC+I or JC+G or JC+Y
  • 10. Substitution model •The decision of what substitution model depends on three sattistical considerations incorporated in both jModeltest and prottest. Akaike Information Criteria (AIC), Bayesian Information criteria (BIC) and Akaike Information Criteria corrected for small samples (AICc). •The model having high scores for AIC and BIC are usually selected as appropriate substitution models for phylogenetic estimation. Phylogeny PhyML at present incorporates analysis using 32 substitution models for DNA. After adding all the tested parameters like MSA, substitution models, + I/ +G/+Y parameter options the tree building can be carried out. PhyML requires a strating user-define tree for building a phlylogenetic tree. If not available PhyML can be commanded to construct by its own a Neighbour-Joining starting tree. The tree can be improved by selecting option like SPR +NNI so that appropriateness in branch lengths can be incorporated. Finally a bootstrapping for 1000 pseudoreplicates is choosen for accuracy of branch topology.
  • 11. Bootstrapping Bootstrapping involves the program to perform the same tree building with pseudoreplicates of the sequences after breaking blocks of alignment and rearranging and then calculation how many times per hundred pseudoreplicates does a branch fall under the same topology. A bootstrap of greated than 70% is significant in general. Higher amount of pseudoreplicates chooses the more accurate is the topological calculations A bootstrap pesudoreplicate of 1000 is preferable but in consideration of time required pseudoreplicate of 100 also suffices.
  • 12. Re-construction •Once the tree is generated, the tree is broadly looked upon for accuracy by bootstrap values of each branch as well as disproportionate branch lengths. •In case of faulty trees, corrections need to be made at both aspects. •If the MSA is cured properly, then one might need to remove rogue taxons (Taxons that are problematic to the tree topology or branch length) using available softwares. The entire process from searching for optimal substitution models may needed to be repeated. •If no rogue taxons can be identified. Reducing the generality of sequence diversity could also be tried. And more relevant sequences only be included in MSA. •The NJ tree option can also be changed to a user defined tree option. •The tree construction is repeated in a number of cycles untill appropriate tree is generated.