SlideShare a Scribd company logo
Concepts of
Bioinformatics
Introduction to Bioinformatics
 Combined
to solve
complex
Biological problems
 Algorithms and techniques of computer science being used to solve the problems
faced by molecular biologists
 ‘Information technology applied to the management and analysis of biological data’
 Storage and Analysis are two important functions – bioinformaticians build tools for
each
 Bio IT market has observed significant growth in genomic era
Biology
Chemistry
Statistics
Computer
science
Bioinformatics
Fields of Bioinformatics
 The need for bioinformatics has arisen from the recent explosion of publicly
available genomic information, such as resulting from the Human Genome
Project.
 Gain a better understanding of gene analysis, taxonomy, & evolution.
 To work efficiently on the rational drug designs and reduce the time taken for
development of drug manually.
 Unravel the wealth of Biological information hidden in mass of sequence,
structure, literature and biological data
 Has environmental-clean up benefits
 In agriculture, it can be used to produce high productivity crops
 Gene Therapy
 Forensic Analysis
 Understanding biological pathways and networks in System Biology
Bioinformatics key areas
Bioinformatics lecture
March 5, 2002
organisation of knowledge
(sequences, structures,
functional data)
e.g. homology
searches
Applications of Bioinformatics
 Provides central, globally accessible databases that enable scientists to submit, search and analyze
information and offers software for data studies, modelling and interpretation.
 Sequence Analysis:-
The application of sequence analysis determines those genes which encode regulatory
sequences or peptides by using the information of sequencing. These computational tools
also detect the DNA mutations in an organism and identify those sequences which are related.
Special software is used to see the overlapping of fragments and their assembly.
 Prediction of Protein Structure:-
It is easy to determine the primary structure of proteins in the form of amino acids which
are present on the DNA molecule but it is difficult to determine the secondary, tertiary or
quaternary structures of proteins. Tools of bioinformatics can be used to determine the
complex protein structures.
 Genome Annotation:-
In genome annotation, genomes are marked to know the regulatory sequences and protein
coding. It is a very important part of the human genome project as it determines the
regulatory sequences
 Comparative Genomics:-
Comparative genomics is the branch of
bioinformatics which determines the
genomic structure and function relation
between different biological species
which enable the scientists to trace the
processes of evolution that occur in
genomes of different species.
 Pharmaceutical Research:-
Tools of bioinformatics are also helpful in
drug discovery, diagnosis and disease
management. Complete sequencing of
human genes has enabled the scientists
to make medicines and drugs which can
target more than 500 genes. Accurate
prediction in screening.
S.
No
Unix Windows Linux
1. Open source Close source Open source
2. Very high security system Low security system High security system
3. Command-line GUI Hybrid
4. File system is arranged in
hierarchical manner
File system is arranged in parallel
manner
File system is arranged in
hierarchical manner
5. Not user friendly User friendly User friendly
6. Single tasking Multi tasking Multi tasking
Biological databanks and databases
 Very fast growth of biological
data
 Diversity of biological data:
o Primary sequences
o 3D structures
o Functional data
 Database entry usually
required for publication
o Sequences
o Structures
Nucleic Acid Protein
EMBL (Europe) PIR -
Protein Information
Resource
GenBank (USA) MIPS
DDBJ (Japan) SWISS-PROT
University of Geneva,
now with EBI
TrEMBL
A supplement to SWISS- PROT
NRL-3D
Major primary databases
Sequence Databases
 Three databanks exchange data on a daily basis
 Data can be submitted and accessed at either location
Nucleotides db:
 GenBank - https://www.ncbi.nlm.nih.gov/
 EMBL - https://www.ebi.ac.uk/
 DDBJ - https://www.ddbj.nig.ac.jp/index-e.html
Bibliographic db:
 PubMed , Medline
Specialized db:
 RDP, IMGT, TRANSFAC, MitBase
Genetic db:
 SGD – https://www.yeastgenome.org/
ACeDB, OMIM
Composite Databases Secondary Databases
 Swiss Prot
 PIR
 GenBank
 NRL-3D
 Store structure info or results
of searches of the primary
databases
Composite Databases Primary Source
PROSITE
https://prosite.expasy.org/
SWISS-PROT
PRINTS
http://130.88.97.239/PRINTS/index.p
hp
OWL
Biological Database
Carbohydrate Structure Database:
 CCSD – https://cordis.europa.eu/project/id/BIOT0184
https://www.genome.jp/dbget-bin/www_bfind?carbbank
 Glycome DB – http://www.glycome-db.org/
Metabolic or Enzyme Databases
 BRENDA – https://www.brenda-enzymes.org/
 KEGG – https://www.genome.jp/kegg/
Structure Databases:
 CATH – https://www.cathdb.info/
 SCOP- http://scop.mrc-lmb.cam.ac.uk/
SCOP
 Structural Classification of Proteins
 http://scop.mrc-lmb.cam.ac.uk/
 SCOP database aims to provide a detailed and comprehensive description of
structural and evolutionary relationship between all proteins
 Levels of hierarchy
 Family : Pairwise residue identities of aa 30% or greater
 Superfamily : Eventhough low seq identities, should have common
evolutionary origin
Eg: ATPase domain of HSP and HK
 Fold : Major structural similarity
 Class : all α , all β, α or β, α and β, Multidomain
CATH
 https://www.cathdb.info/
 Class : 2º structure
 Architecture : Gross orientation of 2º structure, independent of connectivities
 Topology (fold family) : topological connection of super families
 S level : Sequence and structural identities
Basis of Sequence Alignment
1. Aligning sequences
2. To find the relatedness of the proteins or gene, if they have a
common ancestor or not.
3. Mutation in the sequences, brings the changes or divergence in the
sequences.
4. Can also reveal the part of the sequence which is crucial for the
functioning of gene or protein.
 Similarity indicates conserved function
 Human and mouse genes are more than 80% similar
 Comparing sequences helps us understand function
Sequence Alignment
 After obtaining nuc/aa sequences, first thing is to compare with the known sequences.
Comparison is done at the level of constituents. Then finding of conserved residues to predict
the nature and function of the protein. This process of mapping is called
Sequence Alignment
1. Local alignment – Smith & Waterman Algorithm
2. Global alignment – Needleman & Wunch Algorithm
 Gapped Alignment
 Ungapped Alignment
 Terms to Know - Homolog, Ortholog, Paralog, Xenolog, Similar and Identical
Alignment scoring and substitution matrices
Dot plots
Dynamic programming algorithm
Heuristic methods (In order to reduce time)
FASTA
BLAST
Pairwise sequence alignment
Multiple sequence alignment
Scoring a sequence alignment
Match score:
Mismatch score:
Gap penalty:
+ 1
+ 0
–1
Matches: 18 × (+1)
Mismatches: 2 × 0
Gaps: 7 × (– 1)
Score = +11
ACGTCTGAT-------ATAGTCTATCT
ACGTCTGATACGCCGTATAGTCTATCT
AC-T-TGA--CG-CGT-TA-TCTATCT
We can achieve this by penalizing more for a new
gap, than for extending an existing gap
 Maximum no of matches gives high similarity – Optimum Alignment
ACGTCTGATACGCCGTATAGTCTATCT
||||| ||| || ||||||||
----CTGATTCGC---ATCGTCTATCT
 Scores:
 positive for identical or similar
 negative for different
 negative for insertion in one of the two sequences
 Substitution matrices – weights replacement of one residue by another
 assumption of evolution by point mutations
 amino acid replacement (by base replacement)
 amino acid insertion
 amino acid deletion
 Significance of alignment
 Depends critically on gap penalty
 Need to adjust to given sequence
Derivation of substitution matrices
PAM matrices
 First substitution matrix; Developed by Dayhoff (1978) based on Point
Accepted Mutation (PAM) model of evolution
 1PAM (without sub) is a unit of evolutionary divergence in which 1% of the aa
have been changed
 Derived from alignment of very similar sequences
 PAM1 = mutation events that change 1%of AA
 PAM2, PAM3, ... extrapolated by matrix multiplication e.g.: PAM2 = PAM1*PAM1; PAM3 =
PAM2 * PAM1 etc
 Lower distance PAM matrix for closely related proteins eg., PAM30
 Higher distance PAM matrix for highly diverged sequences eg., PAM250
Problems with PAM matrices:
 Incorrect modelling of long time substitutions, since conservative mutations dominated by
single nucleotide change
 e.g.: L <–> I, L <–> V, Y <–> F
 long time: any Amino Acid change
positive and negative values identity score depends on residue
 positive and negative
values identity score
depends on residue
BLOSUM matrices
 BLOCKSAmino acid Substitution Matrices
 Similar as PAM; however the data were derived from local alignments for distantly related proteins
deposited in BLOCKS db
 Unlike PAM there is no evolutionary basis
 BLOSUM series (BLOSUM50, BLOSUM62, ...)
 BLOCKS database:
 ungapped multiple alignments of protein families at a given identity
E.g.,
BLOSUM 30 better for gapped alignments – for comparing highly diverged seq
 BLOSUM 90 better for ungapped alignments – for very close seq
 BLOSUM 62 was derived from a set of sequences which are 62% or less similar
DOT Plot
 Simple comparison without alignment
 2D graphical representation method primarily used for finding regions of
local matches between two sequences
 DOTTER, PALIGN, DOTLET (https://dotlet.vital-it.ch/)
 Distinguish by alignment score
 Similarities increase score (positive)
 Mismatches decrease score (Negative)
 Gaps decrease score
Number of possible dots = (probability of pair) x (length of seq A) x (length of seq B)
Disadv – No direct seq homology & Statistically weak
Dynamic programming algorithm
 To build up optimal alignment which maximizes the similarity we need some scoring
methods
 The dynamic programming relies on a principle of optimality.
PROCEDURE
 Construct a two-dimensional matrix whose axes are the two sequences to be compared.
 The scores are calculated one row at a time. This starts with the first row of one
sequence, which is used to scan through the entire length of the other sequence,
followed by scanning of the second row.
 The scanning of the second row takes into account the scores already obtained in the
first round. The best score is put into the bottom right corner of an intermediate
matrix.
 This process is iterated until values for all the cells are filled.
Depicting the results:
 Back tracing
 The best matching path is the one that has the maximum total score.
 If two or more paths reach the same highest score, one is chosen
arbitrarily to represent the best alignment.
 The path can also move horizontally or vertically at a certain
point, which corresponds to introduction of a gap or an insertion
or deletion for one of the two sequences.
BLAST
 Basic Local Alignment search tool
 https://blast.ncbi.nlm.nih.gov/Blast.cgi
 Multi-step approach to find high-scoring local alignments between
two sequences
 List words of fixed length (3AA) (11nuc) expected to give score larger
than threshold (seed alignment)
 For every word, search database and extend ungapped alignment in
both directions upto a certain length to get HSPs
 New versions of BLAST allow gaps
Blastn:
Blastp:
tBlastn:
Blastx:
tBlastx:
nucleotide sequences
protein sequences
protein query - translated database
nucleotide query - protein database
nucleotide query - translated database
Interpretation
 Rapid and easier to find homolog by scanning huge db
 Search against specialized db
 Blast program employ SEG program to filter low complexity regions before
executing db search
 Quality of the alignment is represented by score (to identify hits)
 Significance of the alignment is represented as e-value (Expected value)
 E-value decreases exponentially as the score increases
 The E-value provides information about the likelihood that a given sequence
match is purely by chance. The lower the E- value, the less likely the
database and therefore more significant the match is.
 If E is between 0.01 and 10, the match is considered not significant.
FASTA
 More sensitive than BLAST
 Table to locate all identically matching words of
length Ktup between two sequences
 Blast – Hit extension step
 Fasta – Exact word match
 As the high value of Ktup increases the search
becomes slow
 FASTA also uses E-values and bit scores. The FASTA
output provides one more statistical parameter,
the Z-score.
 If Z is in the range of 5 to 15, the sequence pair
can be described as highly probable homologs. If
Z < 5, their relationships is described as less
certain
Phylogenetics
 Phylogenetics is the study of evolutionary relatedness among various groups of
organisms (e.g., species, populations).
 Methods of Phylogenetic Analysis:
 Monophyletic group – all taxa share by one common ancestor
 Paraphyletic groping – share common ancestor but not all
 Errors in alignment mislead tree
Phenetic
NJ,
UPGMA
Cladistic
MP
ML
 A phylogenetic tree is a tree showing the
evolutionary interrelationships among various
species or other entities that are believed to
have a common ancestor. A phylogenetic tree
is a form of a cladogram. In a phylogenetic
tree, each node with descendants represents
the most recent common ancestor of the
descendants, and edge lengths correspond to
time estimates.
 Each node in a phylogenetic tree is called a
taxonomic unit. Internal nodes are generally
referred to as Hypothetical Taxonomic Units
(HTUs) as they cannot be directly observed
 Distances – no of changes
Parts of a phylogenetic tree
Node
Root
Outgroup
Ingroup
Branch
Phenetic Method of analysis:
 Also known as numerical taxonomy
 Involves various measures of overall similarity for ranking species
 All the data are first converted to a numerical value without any character
(weighing). Then no of similarities / differences is calculated.
 Then clustering or grouping close together
 Lack of evolutionary significance in phenetics
Cladistic method of analysis:
 Alternative approach
 Diagramming relationship between taxa
 Basic assumption – members of the group share a common evolutionary
history
 Typically based on morphological data
Distance and Character
A tree can be based on
 1. quantitative measures like the distance or similarity between species, or
 2. based on qualitative aspects like common characters.
 Molecular clock assumption – substitution in nu / aa are being compared at constant rate
Maximum Parsimony:
 Finds the optimum tree by minimizing the number of evolutionary changes
 No assumptions on the evolutionary pattern
 MSA then scoring
 Rather time consuming works well if seq have strong similarity
 May oversimplify evolution
 May produce several equally good trees
 PAUP, MacClade
Maximum Likelihood:
 The best tree is found based on assumptions on evolution model
 Nucleotide models more advanced at the moment than aminoacid models
 Programs require lot of capacity from the system
Neighbour Joining:
 The sequences that should be joined are chosen to give the best least-squares estimates of the
branch length that most closely reflect the actual distances between the sequences
 NJ method begins by creating a star topology in which no neighbours are connected
 Then tree is modified by joining pair of sequences. Pair to be joined is chosen by calculating
the sum of branch length
 Distance table
 No molecular clock assumed
UPGMA
 Unweighted Pair Group method with Arithmetic Mean
 Works by clustering, starting with more similar towards distant
 Dot representation
 Molecular clock assumed
PHYLIP (Phylogeny Inference Package)
 Available free in Windows/MacOS/Linux systems
 Parsimony, distance matrix and likelihood methods (bootstrapping and
consensus trees)
 Data can be molecular sequences, gene frequencies, restriction sites and
fragments, distance matrices and discrete characters
Bioinformatics

More Related Content

What's hot

Protein data bank
Protein data bankProtein data bank
Protein data bank
Yogesh Joshi
 
European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)
Hafiz Muhammad Zeeshan Raza
 
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBI
geetikaJethra
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databases
Pranavathiyani G
 
Biological databases
Biological databasesBiological databases
Biological databases
Sarfaraz Nasri
 
Tools and database of NCBI
Tools and database of NCBITools and database of NCBI
Tools and database of NCBI
Santosh Kumar Sahoo
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
JTADrexel
 
Protein databases
Protein databasesProtein databases
Protein databasessarumalay
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
Alichy Sowmya
 
Application of bioinformatics
Application of bioinformaticsApplication of bioinformatics
Application of bioinformatics
Kamlesh Patade
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
sworna kumari chithiraivelu
 
UniProt
UniProtUniProt
UniProt
AmnaA7
 
Protein structure
Protein structureProtein structure
Protein structurePooja Pawar
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
Asad Afridi
 
Biological database
Biological databaseBiological database
Biological database
Iqbal college Peringammala TVM
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
Somdutt Sharma
 
Bioinformatics on internet
Bioinformatics on internetBioinformatics on internet
Bioinformatics on internet
Bahauddin Zakariya University lahore
 
NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
Thapar Institute of Engineering & Technology, Patiala, Punjab, India
 

What's hot (20)

Protein data bank
Protein data bankProtein data bank
Protein data bank
 
European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)
 
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBI
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databases
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Tools and database of NCBI
Tools and database of NCBITools and database of NCBI
Tools and database of NCBI
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Protein databases
Protein databasesProtein databases
Protein databases
 
blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
 
Application of bioinformatics
Application of bioinformaticsApplication of bioinformatics
Application of bioinformatics
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
protein data bank
protein data bankprotein data bank
protein data bank
 
UniProt
UniProtUniProt
UniProt
 
Protein structure
Protein structureProtein structure
Protein structure
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Biological database
Biological databaseBiological database
Biological database
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics on internet
Bioinformatics on internetBioinformatics on internet
Bioinformatics on internet
 
NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
 

Similar to Bioinformatics

bioinformatic.pptx
bioinformatic.pptxbioinformatic.pptx
bioinformatic.pptx
RitikaChoudhary57
 
Project report-on-bio-informatics
Project report-on-bio-informaticsProject report-on-bio-informatics
Project report-on-bio-informaticsDaniela Rotariu
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
xRowlet
 
Protein database
Protein databaseProtein database
Protein database
Khalid Hakeem
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple nadeem akhter
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
journal ijrtem
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
IJRTEMJOURNAL
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
KAUSHAL SAHU
 
Protein databases
Protein databasesProtein databases
Protein databases
bansalaman80
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformaticsAtai Rabby
 
Bioinformatics, application by kk sahu sir
Bioinformatics, application by kk sahu sirBioinformatics, application by kk sahu sir
Bioinformatics, application by kk sahu sir
KAUSHAL SAHU
 
Data retrieval
Data retrievalData retrieval
Knowing Your NGS Downstream: Functional Predictions
Knowing Your NGS Downstream: Functional PredictionsKnowing Your NGS Downstream: Functional Predictions
Knowing Your NGS Downstream: Functional Predictions
Golden Helix Inc
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomics
AyeshaYousaf20
 
Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRON
Prabin Shakya
 
Thesis def
Thesis defThesis def
Thesis def
Jay Vyas
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
Sangeeta Das
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
Anshika Bansal
 
Bioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST ToolBioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST Tool
JesminBinti
 
Genome comparision
Genome comparisionGenome comparision
Genome comparision
Pinky Vincent
 

Similar to Bioinformatics (20)

bioinformatic.pptx
bioinformatic.pptxbioinformatic.pptx
bioinformatic.pptx
 
Project report-on-bio-informatics
Project report-on-bio-informaticsProject report-on-bio-informatics
Project report-on-bio-informatics
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
 
Protein database
Protein databaseProtein database
Protein database
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
 
Protein databases
Protein databasesProtein databases
Protein databases
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
Bioinformatics, application by kk sahu sir
Bioinformatics, application by kk sahu sirBioinformatics, application by kk sahu sir
Bioinformatics, application by kk sahu sir
 
Data retrieval
Data retrievalData retrieval
Data retrieval
 
Knowing Your NGS Downstream: Functional Predictions
Knowing Your NGS Downstream: Functional PredictionsKnowing Your NGS Downstream: Functional Predictions
Knowing Your NGS Downstream: Functional Predictions
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomics
 
Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRON
 
Thesis def
Thesis defThesis def
Thesis def
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Bioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST ToolBioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST Tool
 
Genome comparision
Genome comparisionGenome comparision
Genome comparision
 

Recently uploaded

Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
Sérgio Sacani
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
plant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptxplant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptx
yusufzako14
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
Cherry
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
binhminhvu04
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 

Recently uploaded (20)

Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
plant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptxplant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptx
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 

Bioinformatics

  • 2. Introduction to Bioinformatics  Combined to solve complex Biological problems  Algorithms and techniques of computer science being used to solve the problems faced by molecular biologists  ‘Information technology applied to the management and analysis of biological data’  Storage and Analysis are two important functions – bioinformaticians build tools for each  Bio IT market has observed significant growth in genomic era Biology Chemistry Statistics Computer science Bioinformatics
  • 3. Fields of Bioinformatics  The need for bioinformatics has arisen from the recent explosion of publicly available genomic information, such as resulting from the Human Genome Project.  Gain a better understanding of gene analysis, taxonomy, & evolution.  To work efficiently on the rational drug designs and reduce the time taken for development of drug manually.  Unravel the wealth of Biological information hidden in mass of sequence, structure, literature and biological data  Has environmental-clean up benefits  In agriculture, it can be used to produce high productivity crops  Gene Therapy  Forensic Analysis  Understanding biological pathways and networks in System Biology
  • 4. Bioinformatics key areas Bioinformatics lecture March 5, 2002 organisation of knowledge (sequences, structures, functional data) e.g. homology searches
  • 5. Applications of Bioinformatics  Provides central, globally accessible databases that enable scientists to submit, search and analyze information and offers software for data studies, modelling and interpretation.  Sequence Analysis:- The application of sequence analysis determines those genes which encode regulatory sequences or peptides by using the information of sequencing. These computational tools also detect the DNA mutations in an organism and identify those sequences which are related. Special software is used to see the overlapping of fragments and their assembly.  Prediction of Protein Structure:- It is easy to determine the primary structure of proteins in the form of amino acids which are present on the DNA molecule but it is difficult to determine the secondary, tertiary or quaternary structures of proteins. Tools of bioinformatics can be used to determine the complex protein structures.  Genome Annotation:- In genome annotation, genomes are marked to know the regulatory sequences and protein coding. It is a very important part of the human genome project as it determines the regulatory sequences
  • 6.  Comparative Genomics:- Comparative genomics is the branch of bioinformatics which determines the genomic structure and function relation between different biological species which enable the scientists to trace the processes of evolution that occur in genomes of different species.  Pharmaceutical Research:- Tools of bioinformatics are also helpful in drug discovery, diagnosis and disease management. Complete sequencing of human genes has enabled the scientists to make medicines and drugs which can target more than 500 genes. Accurate prediction in screening.
  • 7. S. No Unix Windows Linux 1. Open source Close source Open source 2. Very high security system Low security system High security system 3. Command-line GUI Hybrid 4. File system is arranged in hierarchical manner File system is arranged in parallel manner File system is arranged in hierarchical manner 5. Not user friendly User friendly User friendly 6. Single tasking Multi tasking Multi tasking
  • 8.
  • 9. Biological databanks and databases  Very fast growth of biological data  Diversity of biological data: o Primary sequences o 3D structures o Functional data  Database entry usually required for publication o Sequences o Structures Nucleic Acid Protein EMBL (Europe) PIR - Protein Information Resource GenBank (USA) MIPS DDBJ (Japan) SWISS-PROT University of Geneva, now with EBI TrEMBL A supplement to SWISS- PROT NRL-3D Major primary databases
  • 10. Sequence Databases  Three databanks exchange data on a daily basis  Data can be submitted and accessed at either location Nucleotides db:  GenBank - https://www.ncbi.nlm.nih.gov/  EMBL - https://www.ebi.ac.uk/  DDBJ - https://www.ddbj.nig.ac.jp/index-e.html Bibliographic db:  PubMed , Medline Specialized db:  RDP, IMGT, TRANSFAC, MitBase Genetic db:  SGD – https://www.yeastgenome.org/ ACeDB, OMIM
  • 11. Composite Databases Secondary Databases  Swiss Prot  PIR  GenBank  NRL-3D  Store structure info or results of searches of the primary databases Composite Databases Primary Source PROSITE https://prosite.expasy.org/ SWISS-PROT PRINTS http://130.88.97.239/PRINTS/index.p hp OWL
  • 12. Biological Database Carbohydrate Structure Database:  CCSD – https://cordis.europa.eu/project/id/BIOT0184 https://www.genome.jp/dbget-bin/www_bfind?carbbank  Glycome DB – http://www.glycome-db.org/ Metabolic or Enzyme Databases  BRENDA – https://www.brenda-enzymes.org/  KEGG – https://www.genome.jp/kegg/ Structure Databases:  CATH – https://www.cathdb.info/  SCOP- http://scop.mrc-lmb.cam.ac.uk/
  • 13. SCOP  Structural Classification of Proteins  http://scop.mrc-lmb.cam.ac.uk/  SCOP database aims to provide a detailed and comprehensive description of structural and evolutionary relationship between all proteins  Levels of hierarchy  Family : Pairwise residue identities of aa 30% or greater  Superfamily : Eventhough low seq identities, should have common evolutionary origin Eg: ATPase domain of HSP and HK  Fold : Major structural similarity  Class : all α , all β, α or β, α and β, Multidomain
  • 14. CATH  https://www.cathdb.info/  Class : 2º structure  Architecture : Gross orientation of 2º structure, independent of connectivities  Topology (fold family) : topological connection of super families  S level : Sequence and structural identities
  • 15. Basis of Sequence Alignment 1. Aligning sequences 2. To find the relatedness of the proteins or gene, if they have a common ancestor or not. 3. Mutation in the sequences, brings the changes or divergence in the sequences. 4. Can also reveal the part of the sequence which is crucial for the functioning of gene or protein.  Similarity indicates conserved function  Human and mouse genes are more than 80% similar  Comparing sequences helps us understand function
  • 16. Sequence Alignment  After obtaining nuc/aa sequences, first thing is to compare with the known sequences. Comparison is done at the level of constituents. Then finding of conserved residues to predict the nature and function of the protein. This process of mapping is called Sequence Alignment 1. Local alignment – Smith & Waterman Algorithm 2. Global alignment – Needleman & Wunch Algorithm  Gapped Alignment  Ungapped Alignment  Terms to Know - Homolog, Ortholog, Paralog, Xenolog, Similar and Identical Alignment scoring and substitution matrices Dot plots Dynamic programming algorithm Heuristic methods (In order to reduce time) FASTA BLAST Pairwise sequence alignment Multiple sequence alignment
  • 17. Scoring a sequence alignment Match score: Mismatch score: Gap penalty: + 1 + 0 –1 Matches: 18 × (+1) Mismatches: 2 × 0 Gaps: 7 × (– 1) Score = +11 ACGTCTGAT-------ATAGTCTATCT ACGTCTGATACGCCGTATAGTCTATCT AC-T-TGA--CG-CGT-TA-TCTATCT We can achieve this by penalizing more for a new gap, than for extending an existing gap  Maximum no of matches gives high similarity – Optimum Alignment ACGTCTGATACGCCGTATAGTCTATCT ||||| ||| || |||||||| ----CTGATTCGC---ATCGTCTATCT
  • 18.  Scores:  positive for identical or similar  negative for different  negative for insertion in one of the two sequences  Substitution matrices – weights replacement of one residue by another  assumption of evolution by point mutations  amino acid replacement (by base replacement)  amino acid insertion  amino acid deletion  Significance of alignment  Depends critically on gap penalty  Need to adjust to given sequence
  • 19. Derivation of substitution matrices PAM matrices  First substitution matrix; Developed by Dayhoff (1978) based on Point Accepted Mutation (PAM) model of evolution  1PAM (without sub) is a unit of evolutionary divergence in which 1% of the aa have been changed  Derived from alignment of very similar sequences  PAM1 = mutation events that change 1%of AA  PAM2, PAM3, ... extrapolated by matrix multiplication e.g.: PAM2 = PAM1*PAM1; PAM3 = PAM2 * PAM1 etc  Lower distance PAM matrix for closely related proteins eg., PAM30  Higher distance PAM matrix for highly diverged sequences eg., PAM250 Problems with PAM matrices:  Incorrect modelling of long time substitutions, since conservative mutations dominated by single nucleotide change  e.g.: L <–> I, L <–> V, Y <–> F  long time: any Amino Acid change
  • 20. positive and negative values identity score depends on residue  positive and negative values identity score depends on residue
  • 21. BLOSUM matrices  BLOCKSAmino acid Substitution Matrices  Similar as PAM; however the data were derived from local alignments for distantly related proteins deposited in BLOCKS db  Unlike PAM there is no evolutionary basis  BLOSUM series (BLOSUM50, BLOSUM62, ...)  BLOCKS database:  ungapped multiple alignments of protein families at a given identity E.g., BLOSUM 30 better for gapped alignments – for comparing highly diverged seq  BLOSUM 90 better for ungapped alignments – for very close seq  BLOSUM 62 was derived from a set of sequences which are 62% or less similar
  • 22. DOT Plot  Simple comparison without alignment  2D graphical representation method primarily used for finding regions of local matches between two sequences  DOTTER, PALIGN, DOTLET (https://dotlet.vital-it.ch/)  Distinguish by alignment score  Similarities increase score (positive)  Mismatches decrease score (Negative)  Gaps decrease score Number of possible dots = (probability of pair) x (length of seq A) x (length of seq B) Disadv – No direct seq homology & Statistically weak
  • 23. Dynamic programming algorithm  To build up optimal alignment which maximizes the similarity we need some scoring methods  The dynamic programming relies on a principle of optimality. PROCEDURE  Construct a two-dimensional matrix whose axes are the two sequences to be compared.  The scores are calculated one row at a time. This starts with the first row of one sequence, which is used to scan through the entire length of the other sequence, followed by scanning of the second row.  The scanning of the second row takes into account the scores already obtained in the first round. The best score is put into the bottom right corner of an intermediate matrix.  This process is iterated until values for all the cells are filled.
  • 24. Depicting the results:  Back tracing  The best matching path is the one that has the maximum total score.  If two or more paths reach the same highest score, one is chosen arbitrarily to represent the best alignment.  The path can also move horizontally or vertically at a certain point, which corresponds to introduction of a gap or an insertion or deletion for one of the two sequences.
  • 25. BLAST  Basic Local Alignment search tool  https://blast.ncbi.nlm.nih.gov/Blast.cgi  Multi-step approach to find high-scoring local alignments between two sequences  List words of fixed length (3AA) (11nuc) expected to give score larger than threshold (seed alignment)  For every word, search database and extend ungapped alignment in both directions upto a certain length to get HSPs  New versions of BLAST allow gaps Blastn: Blastp: tBlastn: Blastx: tBlastx: nucleotide sequences protein sequences protein query - translated database nucleotide query - protein database nucleotide query - translated database
  • 26.
  • 27. Interpretation  Rapid and easier to find homolog by scanning huge db  Search against specialized db  Blast program employ SEG program to filter low complexity regions before executing db search  Quality of the alignment is represented by score (to identify hits)  Significance of the alignment is represented as e-value (Expected value)  E-value decreases exponentially as the score increases  The E-value provides information about the likelihood that a given sequence match is purely by chance. The lower the E- value, the less likely the database and therefore more significant the match is.  If E is between 0.01 and 10, the match is considered not significant.
  • 28. FASTA  More sensitive than BLAST  Table to locate all identically matching words of length Ktup between two sequences  Blast – Hit extension step  Fasta – Exact word match  As the high value of Ktup increases the search becomes slow  FASTA also uses E-values and bit scores. The FASTA output provides one more statistical parameter, the Z-score.  If Z is in the range of 5 to 15, the sequence pair can be described as highly probable homologs. If Z < 5, their relationships is described as less certain
  • 29. Phylogenetics  Phylogenetics is the study of evolutionary relatedness among various groups of organisms (e.g., species, populations).  Methods of Phylogenetic Analysis:  Monophyletic group – all taxa share by one common ancestor  Paraphyletic groping – share common ancestor but not all  Errors in alignment mislead tree Phenetic NJ, UPGMA Cladistic MP ML
  • 30.  A phylogenetic tree is a tree showing the evolutionary interrelationships among various species or other entities that are believed to have a common ancestor. A phylogenetic tree is a form of a cladogram. In a phylogenetic tree, each node with descendants represents the most recent common ancestor of the descendants, and edge lengths correspond to time estimates.  Each node in a phylogenetic tree is called a taxonomic unit. Internal nodes are generally referred to as Hypothetical Taxonomic Units (HTUs) as they cannot be directly observed  Distances – no of changes Parts of a phylogenetic tree Node Root Outgroup Ingroup Branch
  • 31. Phenetic Method of analysis:  Also known as numerical taxonomy  Involves various measures of overall similarity for ranking species  All the data are first converted to a numerical value without any character (weighing). Then no of similarities / differences is calculated.  Then clustering or grouping close together  Lack of evolutionary significance in phenetics Cladistic method of analysis:  Alternative approach  Diagramming relationship between taxa  Basic assumption – members of the group share a common evolutionary history  Typically based on morphological data
  • 32. Distance and Character A tree can be based on  1. quantitative measures like the distance or similarity between species, or  2. based on qualitative aspects like common characters.  Molecular clock assumption – substitution in nu / aa are being compared at constant rate
  • 33. Maximum Parsimony:  Finds the optimum tree by minimizing the number of evolutionary changes  No assumptions on the evolutionary pattern  MSA then scoring  Rather time consuming works well if seq have strong similarity  May oversimplify evolution  May produce several equally good trees  PAUP, MacClade Maximum Likelihood:  The best tree is found based on assumptions on evolution model  Nucleotide models more advanced at the moment than aminoacid models  Programs require lot of capacity from the system
  • 34. Neighbour Joining:  The sequences that should be joined are chosen to give the best least-squares estimates of the branch length that most closely reflect the actual distances between the sequences  NJ method begins by creating a star topology in which no neighbours are connected  Then tree is modified by joining pair of sequences. Pair to be joined is chosen by calculating the sum of branch length  Distance table  No molecular clock assumed UPGMA  Unweighted Pair Group method with Arithmetic Mean  Works by clustering, starting with more similar towards distant  Dot representation  Molecular clock assumed
  • 35. PHYLIP (Phylogeny Inference Package)  Available free in Windows/MacOS/Linux systems  Parsimony, distance matrix and likelihood methods (bootstrapping and consensus trees)  Data can be molecular sequences, gene frequencies, restriction sites and fragments, distance matrices and discrete characters