BIOINFORMATICS_AND_PHYLOGENY.pdf.pdf

Bioinformatics and
Phylogenetic Analysis
Edgar Scott
Multicampus Bioinformatics
Education Specialist

What is Bioinformatics
 Interdisciplinary field that combines
principles and techniques from
computer science, probability and
statistics, and linguistics to the study of
genomic and proteomic sequences.
 Biological database for storing and
organizng DNA and protein sequences
 Computational tools for analyzing
sequences

Phylogenetic Analysis and
Bioinformatics
 Phylogenetics – study of evolutionary
relationships
 Phylogenetic trees used to represent
evolutionary relationships
 Use of protein or DNA sequences to detect
relationships versus morphological characters
 Bioinformatics provides both sequence
repositories and sequence analysis software.

Overview
 Acquiring Data Set
 Text searching at the National Center for
Biotechnology Information (NCBI)
 Sequence similarity and homology
 Sequence similarity searching with Basic Local
Alignment Search Tool (BLAST)
 Analyzing Data Set
 Phylogenetic Analysis with Molecular Evolutionary
Genetics Analysis (MEGA) 3.1 software
 Build multiple sequence alignments of sequences using
ClustalW
 Build phylogenetic trees

Text Searching at NCBI
 NCBI maintains provides molecular
information and bioinformatic tools to
the scientific community
 GenBank – an archival DNA and protein
sequence database
 RefSeq – a curated DNA and protein
sequence database
 Entrez Gene – a gene centered database

Sequence Similarity and
Homology
 Homology – sequence that share a common
ancestral sequence
 Paralogs – arise via gene duplication
 Orthologs – arise via speciation event
 Xenologs – arise via gene transfer
 Evolutionarily related sequences have similar
sequences.
 Sequence differences correspond to amount
of change that has occurred since they last
shared a common ancestral sequence.

Sequence Alignments
 Sequence Alignment – a process that identifies a
series of characters or character patterns that are in
the same order in both sequences.
 Pairwise Global alignment
 Pairwise Local alignment
 Optimal alignment – an alignment between
sequences in which the number of matching
characters are maximized and the mismatching
characters are minimized.
 Quantifying alignments
 Alignment score of the optimal alignment
 Percent identity scores
 Percent similarity scores

Sequence Similarity Searching
 Basic Local Alignment Search Tool (BLAST)
 Blastp, Blastn, Blastx, Tblastn, & TblastX
 Local alignments are reported
 Expectation Value – the number of times an
investigator can expect to find an alignment
that has an alignment score as good or better
than the alignment score under consideration.

Steps to Build a Tree
 Build a multiple sequence alignment of
data set.
 Analyze multiple sequence alignment
using either distance based methods or
character based methods.

Molecular Evolutionary
Genetics Analysis (MEGA) 3.1
 Phylogenetic Analysis program
 Constructs multiple sequence alignment using
ClustalW
 Provides tree building methods
 Distance based Methods
 UPGMA
 Neighbor-joining method
 Minimum Evolution
 Character based Method
 Maximum Parsimony
 Provides a great help document!

Multiple Sequence Alignment
 Multiple Sequence Alignment – an alignment
between three or more sequences.
 Computationally classified as NP-hard
 Programs
 ClustalW – fast, applies a progressive method
 T-Coffee – slower, applies an advanced
progressive method
 Dialign – slow, applies an iterative method
 Combine – combines multiple sequence
alignments

Tree Building methods
 UPGMA, Neighbor-Joining, Minimum Evolution
 Distance based methods
 Analyze the multiple sequence alignment to
calculate a distance matrix.
 Clustering algorithm analyzes the distance matrix
to determine which sequences should be
clustered.
 Maximum parsimony
 Character based method
 Analyze the multiple sequence alignment to create
a tree whose tree length has been minimized.

Tree Reliability
 Bootstrapping – method for assessing
the reliability of trees.
 Steps
 The original data set is resampled several
times (e.g. 1000).
 For each resampling, a tree is built
 The trees created from the resampling
iterations are compared to the original
tree.

Review
 Acquiring Data Set
 Text searching at the National Center for
Biotechnology Information (NCBI)
 Sequence similarity and homology
 Sequence similarity searching with Basic Local
Alignment Search Tool (BLAST)
 Analyzing Data Set
 Phylogenetic Analysis with Molecular Evolutionary
Genetics Analysis (MEGA) 3.1 software
 Build multiple sequence alignments of sequences using
ClustalW
 Build phylogenetic trees

BIOINFORMATICS_AND_PHYLOGENY.pdf.pdf

More Related Content

What's hot

Similar to BIOINFORMATICS_AND_PHYLOGENY.pdf.pdf

Recently uploaded

BIOINFORMATICS_AND_PHYLOGENY.pdf.pdf