SlideShare a Scribd company logo
HBC1011 Biochemistry I
Lecture 16 and 17 – Exploring
Evolution and Bioinformatics
Ng Chong Han, PhD
ITAR1010, 06-2523751
chng@mmu.edu.my
Overview
• Homology, paralogs, orthologs, convergent
& divergent evolution
• Statistical analysis of sequence alignments
• Evolutionary relationships: protein
sequences & tertiary structures
• Evolutionary tree
3
Evolutionary relationships are present in protein sequences.
The human myoglobin sequence (red) differs from the chimpanzee sequence
(blue) in only one amino acid in a protein chain of 153 residues
Homologs are molecules derived from a
common ancestor
• Exploration of biochemical evolution attempt to determine
how proteins, other molecules, & biochemical pathways
have been transformed through time.
• Most fundamental relationship between entities =
homology
• 2 molecules are said to be homologous if they have been
derived from a common ancestor.
• Search sequence database for sequence-comparison
analysis
• Gene duplication: any duplication of a region of DNA that
contains a gene, which is generated during molecular
evolution, can arise as products from DNA replication and
repair machinery.
5
Homologous molecules = Homologs
Paralogs Orthologs
Homologs present
within one species
Homologs present in
different species
(Differ in their detail
biochemical functions,
some exception)
(very similar or identical
functions, some
exception)
2 classes of homologs
Homologs that perform identical or
very similar functions in different
organisms are called orthologs,
whereas homologs that perform
different functions within one
organism are called paralogs.
Human
Orthology
8
• Homologous sequences are orthologous if they are inferred
to be descended from the same ancestral sequence
separated by a speciation event: when a species diverges
into two separate species.
• For instance, the plant Flu regulatory protein is present both
in Arabidopsis (multicellular higher plant) and
Chlamydomonas (single cell green algae). The complex
Chlamydomonas version can fully substitute the much
simpler Arabidopsis protein, if transferred from algae to plant
genome by means of molecular cloning.
• Orthologs often, but not always, have the same function.
Orthology
9
• Orthologous sequences provide useful information in taxonomic
classification and phylogenetic studies of organisms.
• Two organisms that are very closely related are likely to display
very similar DNA sequences between two orthologs.
Conversely, an organism
that is further removed
evolutionarily from another
organism is likely to display
a greater divergence in the
sequence of the orthologs
being studied.
Paralogy
10
• Homologous sequences are paralogous if they were
created by a duplication event within the genome.
• For gene duplication events, if a gene in an organism is
duplicated to occupy two different positions in the same
genome, then the two copies are paralogous.
• Paralogous genes often belong to the same species, but
this is not necessary: eg, the hemoglobin gene of humans
and the myoglobin gene of chimpanzees are paralogs.
Paralogy
11
• Paralogous sequences provide useful and dramatic
insight into some of the way genomes evolve.
• Function is not always conserved, however.
• Human angiogenin diverged from ribonuclease, for
example, and while the two paralogs remain similar in
tertiary structure, their functions within the cell are now
quite different.
Human
Paralogy regions
12
• Sometimes, large chromosomal regions share gene content similar
to other chromosomal regions within the same genome.
• Examples of paralogy regions include regions of human
chromosome 2, 7, and 12 containing Hox gene clusters, collagen
genes and keratin genes.
(common ancestor)
Two segments of DNA can have shared ancestry because of either
a speciation event (orthologs) or a duplication event (paralogs).
The importance of the study of the
homology
14
• Reveal the evolutionary
history of molecules
• Information about their
function
• i.e.: if a newly
sequenced protein is
homologous to an
already characterized
protein strong
indication of the new
protein’s biochemical
function.
Statistical analysis of sequence alignments
can detect homology
• How can we know whether 2 human protein are paralogs
or whether a yeast protein is the ortholog of a human
protein?
• Significant sequence similarity between 2 molecules =
likely to have the same evolutionary origin & therefore,
same 3-D structure, function & mechanism.
• Since protein sequences are better conserved
evolutionarily than nucleotide sequences, protein
sequence comparison produces more reliable and
accurate results when dealing with coding DNA.
15
Sequence comparison methods
• The sequences of two proteins that have an ancestor in common
will have diverged in a variety of ways.
• Insertions and deletions may have occurred at the ends of the
proteins or within the functional domains themselves.
• Individual amino acids may have been mutated to other residues
of varying degrees of similarity.
16
Human
hemoglobin (α
chain) 141 a.a. &
Human
myoglobin (α
chain) 153 a.a.
Sequence comparison methods
• Globins
– Myoglobin: binds oxygen in muscle
– Hemoglobin: oxygen-carrying protein in blood,
composed of 2 identical α chains & 2 identical β chains
• Both cradle a heme group: an iron containing organic
molecule that binds the oxygen.
17
To detect sequence
similarity, we perform
sequence alignment.
How can we tell where to align the 2
sequences?
• Approach:
– Compare all possible juxtaposition of one protein
sequence with another, in each case recording
the number of identical residues that are aligned
with one another.
– Comparison can be accomplished by simply
sliding one sequence past the other, one a.a at a
time & counting the number of matched residues.
18
(A) A comparison is made
by sliding the sequences of
the 2 proteins past each
other, 1 amino acid at a
time, and counting the
number of amino acid
identities between the
proteins
(B) The 2 alignments with
the largest number of
matches are shown above
the graph, which plots the
matches as a function of
alignment.
Largest
no. of
matches
Alignment with gap insertion
• The sequences can be aligned to capture most of the
identities by introducing a gap into one of the sequence.
• Gap insert to compensate for the insertion/deletions of
nucleotides that may have taken place in the gene.
• Gap increases the complexity of sequence alignment: gap
of arbitrary size
• Method: use scoring system to compare different
alignments & include penalties (to prevent unreasonable
number of insertion)
20
Gap
Alignment with gap insertion:
Scoring system
21
• The alignment of α hemoglobin & myoglobin after a gap has
been inserted into the hemoglobin α sequence
Identity between aligned sequence = +10 points;
gap (regardless size) = -25 points.
38 identities & 1 gap; score = ((38x10) + (1x-25)) = 355)
38 matched amino acid in average 147 residues ((153+141)/2)
, so the sequences are 25.9% (38/147x100) identical.
The statistical significance of alignments can
be estimated by shuffling
22
• Because proteins are composed of the same set of 20 amino
acids, the alignment of any two unrelated proteins will yield
some identities, especially if gaps are allowed.
• Even if two proteins have identical amino acid composition,
they may not be linked by evolution. It is the order of the
residues that implies a relationship.
How can we
estimate the
probability that a
specific series of
identities is a
chance occurrence?
The statistical significance of alignments can
be estimated by shuffling
23
• The process of the sequences shuffling is repeated many
times to yield a histogram – the score from the original
alignment should be higher than the scores from random
shuffling.
The high
alignment
score does
not occur
by chance.
Original
alignment
score
Random
alignment
score
Distant evolutionary relationships can be
detected through the use of substitution matrices
• Scoring scheme discussed previously assigned
points only to positions occupied by identical a.a
• No credit for non-identical a.a
• How about substitution?
• A scoring system based solely on amino acid
identity cannot account for these changes.
24
Types of substitution
25
Substitution
nonconservativeconservative
Replacing one a.a with
another that is similar in size
and chemical properties.
May have minor effects on
protein structure and can
thus be tolerated without
compromising function.
An amino acid
replaces one that
is dissimilar
Conservative and single-nucleotide
substitutions are likely to be more
common than are substitutions with
more radical effects.
Substitution matrix
• Substitution matrix – a scoring system for the replacement of
any amino acid with each of the other 19 amino acids.
• Large positive score corresponds to substitution that occurs
relatively frequently
• Large negative score corresponds to substitution that occurs
only rarely
• When 2 seq are compared, each substitution is assigned a
score based on matrix.
26
Blosum-62 : Blocks
of amino acid
substitution matrix
Blosum-62 substitution matrix.
Arginine Lysine,
conservative
Valine  Lysine,
nonconservative
D E H K R N Q S T A C G P F I L M V W Y
red: charged, green: polar, blue:
large and hydrophobic, black: other
Blosum-62 score
• A single-residue gap: -12 points
• Additional single gap: -2 points per residue
28
identities
Conservative
substitution
gap
Blosum-62 score
• The alignment of hemoglobin & myoglobin with conservative
substitutions indicated by yellow shading and identities by
orange. Score = 115
29
identities
Conservative
substitution
gap
Blosum-62
• Blosum-62: Detects homology between less obviously
related sequences (not only detect identity)
• Alignment of human myoglobin & lupine (plant)
leghemoglobin. Identities: orange boxes; conservative
substitution: . These sequences are 23% identical.
30
Alignment of identities versus Blosum-62
• Alignment of identities: the probability of the alignment occurs
by chance alone is high (1:20).
• Blosum-62: the probability of the alignment occurs by chance
alone is very low (1:300), better, firmer conclusion.
31
Sequence analysis – rule of thumb
• For sequences longer than 100 amino acids, sequence
identities > 25% = statistical significant similarity =
sequences are probably homologous.
• If 2 sequences are less than 15% identical = pairwise
comparison alone is unlikely to indicate statistically
significant similarity
• If between 15% to 25% further analysis
The lack of a statistically significant degree of sequence
similarity does not rule out homology
Why??
32
Homology VS Similarity
33
• Similarity refers to the
likeness or % identity
between 2 sequences
• Similarity means sharing a
statistically significant
number of amino acids
• Similarity does not imply
homology
• Homology refers to shared
ancestry
• Two sequences are
homologous is they are
derived from a common
ancestral sequence
• Homology usually implies
similarity
Homology among proteins is often incorrectly concluded on the basis of
sequence similarity. High sequence similarity might occur because
of convergent evolution, or, as with shorter sequences, because of chance.
Such sequences are similar but not homologous.
Databases can be searched to identify
homologous sequences
• Database search for homologous seq: using online
resources on NCBI (National Center for Biotechnology
Information)
• Procedure: BLAST (Basic Local Alignment Search Tool)
search.
• Result: a list of sequence alignments.
• Open reading frame (ORF): protein-coding region
• Hypothetical protein: ORF with no assigned function
34
E value (highlighted in red): the number of sequences with this
level of similarity expected to be in the DB by chance is 2x10-25
Examination of 3-D structure enhances our
understanding of evolutionary relationship
• To gain a deeper understanding of evolutionary
relationships between proteins, we must examine
3-D structures because
– The sequences of many proteins that have been
descended from a common ancestor have diverged to
such an extent that the relationship between the proteins
can no longer be detected from their sequences alone.
– Biomolecules generally function as intricate 3-D structures
rather than as linear polymers.
– Sequence mutation affected function & function directly
related to tertiary structure
36
Tertiary structure is more conserved than
primary structure
• Because 3-D structure is much more closely
associated with function than its sequence, tertiary
structure is more evolutionarily conserved than its
primary structure.
• i.e.: tertiary structures of globin, extremely similar
even though the similarity between human
myoglobin & lupine leghemoglobin is just barely
detectable at seq level & that between human
hemoglobin and lupine leghemoglobin is not
statistical significant.
37
Conservation of 3-D structure. The tertiary structures of human hemoglobin,
human myoglobin, & lupine leghemoglobin are conserved. This structural
similarity firmly establishes that the framework that binds the heme group &
facilitates the reversible binding of oxygen has been conserved over a long
evolutionary period.
Tertiary structure is more conserved than
primary structure
• Comparison of 3-D structures has revealed striking
similarities between proteins that were not expected
to be related.
• i.e.: protein actin (major component of the
cytoskeleton) & heat shock protein 70 (assists
protein folding inside cell)
– Similar in structure, only 15.6% sequence identity
– Paralogs
– Different biological roles, descended from a
common ancestor
39
Structures of Actin & Hsp-70. A comparison of the identically colored
elements of secondary structure reveals the overall similarity in structure
despite the difference in biochemical activities.
Conserved function sequence
41
• Regions & residues critical for protein function are more
strongly conserved than are other residues.
• i.e.: each type of globin contains a bound heme group with
an iron atom at its center. A histidine residue that interacts
directly with this iron is conserved in all globins.
Identified key residues/highly
conserved sequences within a family
of proteins identify other family
members even when the overall level
of sequence similarity is below
statistical significance.
Divergent and Convergent evolution
• Divergent evolution: process by which 2 or more biological
characteristics have a common origin, but have diverged
over evolutionary time.
How might two unrelated proteins come to resemble each
other structurally? Two proteins evolving independently may
have converged on a similar structure in order to perform a
similar biochemical activity.
• Convergent evolution: process by which very different
evolutionary pathways lead to the same solution (different
origin points).
42
One example of convergent evolution is the serine
protease family, which cleaves peptide bonds by
hydrolysis. The structure of the active sites at which the
hydrolysis reaction takes place are remarkably similar.
The similarity might suggest that these proteins are homologous.
However, striking differences in the overall structures of these
proteins make an evolutionary relationship extremely unlikely.
Evolutionary tree can be constructed on the
basis of sequence information
• Aligned sequences can be used to construct an
evolutionary tree in which the length of the branch
connecting each pair of proteins is proportional to the
number of amino acid differences between the
sequences. Branch lengths indicate genetic change i.e.
the longer the branch, the more genetic change has
occurred.
• To estimate the approximates dates of gene duplications
& other evolutionary events, evolutionary tree can be
calibrated comparing the deduced branch points with
divergence times determined from the fossil record.
45
An evolutionary tree for globins. The branching structure was deduced by
sequence comparison, whereas the results of fossil studies provided the
overall time scale showing when divergence occurred.
Evolutionary tree can be constructed on the
basis of sequence information
How can we estimate the approximate dates of gene
duplications and other evolutionary events?
• Duplication leading to the 2 chains of hemoglobin appears to
have occurred 350 million years ago.
– This estimation is supported by the observation that
jawless fish such as the lamprey, which diverged from bony
fish ~400 million years ago, contain hemoglobin built from a
single type of polypeptide
chain.
47
The lamprey
Modern techniques make the experimental
exploration of evolution possible
• Ancient DNA can sometimes be amplified and sequenced using
polymerase chain reaction (PCR) and DNA sequencing.
• This approach has been applied to mitochondrial DNA from a
Neanderthal fossil estimated at between 30,000 and 100,000 years
of age found near Düsseldorf, Germany, in 1856. Comparison with
the sequences from Homo sapiens revealed between 22 and 36
substitutions, considerably fewer than the average of 55 differences
between human beings and chimpanzees over the common bases in
this region.
48
Modern techniques make the experimental
exploration of evolution possible
• Further analysis suggested that the common ancestor of modern
human beings and Neanderthals lived approximately 600 million
years ago.
• An evolutionary tree constructed by using these and other data
revealed that the Neanderthal was not an intermediate between
chimpanzees and human beings but, instead, was an evolutionary
"dead end" that became extinct
49
Successful sequencing of
ancient DNA requires
sufficient DNA for reliable
amplification and the
rigorous exclusion of all
sources of contamination.
Archeological sites in Indonesia
• Homo floresiensis ("Flores Man"; nicknamed "hobbit") is an
extinct species thought to be in the genus Homo. The remains of
an individual (1.1 m in height) were discovered in 2003 at Liang
Bua on the island of Flores in Indonesia.
• This hominin had originally been considered to be remarkable
for its survival until only 12,000 years ago. However, by 2016,
more work has pushed their existence back to 50,000 years ago.
50
Glossary
• BLOSUM
– Blocks Substitution Matrix. A substitution matrix in which scores for
each position are derived from observations of the frequencies of
substitutions in blocks of local alignments in related proteins. Each
matrix is tailored to a particular evolutionary distance. In the
BLOSUM62 matrix, for example, the alignment from which scores
were derived was created using sequences sharing no more than
62% identity.
• Alignment
– The process of lining up two or more sequences to achieve
maximal levels of identity (and conservation, in the case of amino
acid sequences) for the purpose of assessing the degree of
similarity and the possibility of homology.
• Juxtaposition
– the act of placing two or more things side by side or the state of
being so placed.
• E value
– Expectation value. The number of different alignments with
scores equivalent to or better than raw score that are expected to
occur in a database search by chance. The lower the E value, the
more significant the score.
• Substitution
– The presence of a non-identical amino acid at a given position in
an alignment. If the aligned residues have similar physico-
chemical properties the substitution is said to be "conservative".
• Conservation
– Changes at a specific position of an amino acid or (less
commonly, DNA) sequence that preserve the physico-chemical
properties of the original residue.
• Identity
– The extent to which two (nucleotide or amino acid) sequences
are invariant.
• gap
– A space introduced into an alignment or position at which a letter
is paired with a null.
• Similarity
– The extent to which nucleotide or protein sequences are related.
The extent of similarity between two sequences can be based on
percent sequence identity and/or conservation. In BLAST
similarity refers to a positive matrix score.
• Query
– The input sequence (or other type of search term) with which all
of the entries in a database are to be compared.
Summary
1. Homologs are descended from a common ancestor.
2. Statistical analysis of sequence alignments can detect
homology.
3. Examination of three-dimensional structure enhances our
understanding of evolutionary relationships.
4. Evolutionary trees can be constructed on the basis of
sequence information.
54
Study questions
1. What are the differences between paralog and ortholog?
2. How can we study the function of a novel gene using
sequence alignment?
3. Why is it possible two similar sequences not homologous?
4. Why is protein sequence comparison produce more
accurate result than nucleotide sequence comparison?
5. Why is tertiary structure more evolutionarily conserved than
its primary structure?
6. What is a conservative substitution?
7. What is a sequence alignment?
8. What online tool can be used to search for homologous
sequences?
55
How confident can we be that orthologs are
similar, but paralogs differ?
56
• The idea that orthologs share similar functions, whereas
paralogs have different functions, has thus become accepted
by many and is the standard textbook model, as exemplified
by the ‘Phylogenetics Factsheet’ of the National Centre for
Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.-
gov/About/primer/phylo.html).
• However, more new evidences show that orthologs and
paralogs are not so different in either their evolutionary rates
or their mechanisms of divergence.
• Thus, functional change between orthologs might be as
common as between paralogs, and future studies should be
designed to test the impact of duplication against this
alternative model.
Studer and Robinson-Rechavi (2009)

More Related Content

What's hot

Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
naveed ul mushtaq
 
Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis
Nitin Naik
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
Hafiz Muhammad Zeeshan Raza
 
Entrez databases
Entrez databasesEntrez databases
Entrez databases
Hafiz Muhammad Zeeshan Raza
 
Dot matrix
Dot matrixDot matrix
Dot matrix
Tania Khan
 
Gene prediction and expression
Gene prediction and expressionGene prediction and expression
Gene prediction and expression
ishi tandon
 
sequence of file formats in bioinformatics
sequence of file formats in bioinformaticssequence of file formats in bioinformatics
sequence of file formats in bioinformatics
nadeem akhter
 
Needleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmNeedleman-Wunsch Algorithm
Needleman-Wunsch Algorithm
ProshantaShil
 
Gene bank by kk sahu
Gene bank by kk sahuGene bank by kk sahu
Gene bank by kk sahu
KAUSHAL SAHU
 
Genome annotation 2013
Genome annotation 2013Genome annotation 2013
Genome annotation 2013
Karan Veer Singh
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
Mariya Raju
 
Swiss prot database
Swiss prot databaseSwiss prot database
Swiss prot database
sagrika chugh
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
Vidya Kalaivani Rajkumar
 
Distance based method
Distance based method Distance based method
Distance based method
Adhena Lulli
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
Subhranil Bhattacharjee
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformatics
VinaKhan1
 
Sequence alignment 1
Sequence alignment 1Sequence alignment 1
Sequence alignment 1
SumatiHajela
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)
Pritom Chaki
 
Introduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbjIntroduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbj
KAUSHAL SAHU
 
Prosite
PrositeProsite

What's hot (20)

Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
 
Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
 
Entrez databases
Entrez databasesEntrez databases
Entrez databases
 
Dot matrix
Dot matrixDot matrix
Dot matrix
 
Gene prediction and expression
Gene prediction and expressionGene prediction and expression
Gene prediction and expression
 
sequence of file formats in bioinformatics
sequence of file formats in bioinformaticssequence of file formats in bioinformatics
sequence of file formats in bioinformatics
 
Needleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmNeedleman-Wunsch Algorithm
Needleman-Wunsch Algorithm
 
Gene bank by kk sahu
Gene bank by kk sahuGene bank by kk sahu
Gene bank by kk sahu
 
Genome annotation 2013
Genome annotation 2013Genome annotation 2013
Genome annotation 2013
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
 
Swiss prot database
Swiss prot databaseSwiss prot database
Swiss prot database
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Distance based method
Distance based method Distance based method
Distance based method
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformatics
 
Sequence alignment 1
Sequence alignment 1Sequence alignment 1
Sequence alignment 1
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)
 
Introduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbjIntroduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbj
 
Prosite
PrositeProsite
Prosite
 

Similar to 222397 lecture 16 17

bioinformatics lecture 2.pptx and computational Boilogygy
bioinformatics lecture 2.pptx and computational Boilogygybioinformatics lecture 2.pptx and computational Boilogygy
bioinformatics lecture 2.pptx and computational Boilogygy
MUHAMMEDBAWAYUSUF
 
Molecular evolution
Molecular evolutionMolecular evolution
Molecular evolution
Promila Sheoran
 
06_Alignment_2022.pdf
06_Alignment_2022.pdf06_Alignment_2022.pdf
06_Alignment_2022.pdf
Kristen DeAngelis
 
Computational Prediction Of Protein-1.pptx
Computational Prediction Of Protein-1.pptxComputational Prediction Of Protein-1.pptx
Computational Prediction Of Protein-1.pptx
ashharnomani
 
4. sequence alignment.pptx
4. sequence alignment.pptx4. sequence alignment.pptx
4. sequence alignment.pptx
ArupKhakhlari1
 
Homology modeling
Homology modelingHomology modeling
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to functionAbhik Seal
 
2010 11-22 bcmb02-print_grayscale
2010 11-22 bcmb02-print_grayscale2010 11-22 bcmb02-print_grayscale
2010 11-22 bcmb02-print_grayscale
MateenMuzafar
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptx
Ranjan Jyoti Sarma
 
Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014
Prof. Wim Van Criekinge
 
Molecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contructionMolecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contruction
UdayBhanushali111
 
Characterizing the aggregation and conformation of protein therapeutics
Characterizing the aggregation and conformation of protein therapeuticsCharacterizing the aggregation and conformation of protein therapeutics
Characterizing the aggregation and conformation of protein therapeutics
KBI Biopharma
 
Protein protein interaction
Protein protein interactionProtein protein interaction
Protein protein interaction
Aashish Patel
 
Protein Chemistry-Proteomics-Lec1_Intro.ppt
Protein Chemistry-Proteomics-Lec1_Intro.pptProtein Chemistry-Proteomics-Lec1_Intro.ppt
Protein Chemistry-Proteomics-Lec1_Intro.ppt
Sachin Teotia
 
Lecture__on__Proteomics_Introduction.ppt
Lecture__on__Proteomics_Introduction.pptLecture__on__Proteomics_Introduction.ppt
Lecture__on__Proteomics_Introduction.ppt
Sachin Teotia
 
Protein 3D structure and classification database
Protein 3D structure and classification database Protein 3D structure and classification database
Protein 3D structure and classification database nadeem akhter
 
Protein structure analysis
Protein structure analysis Protein structure analysis
Protein structure analysis
Anfal Izaldeen AL KATEEB
 
Bioinformatics t7-protein structure-v2013_wim_vancriekinge
Bioinformatics t7-protein structure-v2013_wim_vancriekingeBioinformatics t7-protein structure-v2013_wim_vancriekinge
Bioinformatics t7-protein structure-v2013_wim_vancriekinge
Prof. Wim Van Criekinge
 
Types of genomics ppt
Types of genomics pptTypes of genomics ppt
Types of genomics ppt
Hina Zamir Noori
 
Proteomics a search tool for vaccines
Proteomics a search tool for vaccinesProteomics a search tool for vaccines
Proteomics a search tool for vaccines
Lawrence Okoror
 

Similar to 222397 lecture 16 17 (20)

bioinformatics lecture 2.pptx and computational Boilogygy
bioinformatics lecture 2.pptx and computational Boilogygybioinformatics lecture 2.pptx and computational Boilogygy
bioinformatics lecture 2.pptx and computational Boilogygy
 
Molecular evolution
Molecular evolutionMolecular evolution
Molecular evolution
 
06_Alignment_2022.pdf
06_Alignment_2022.pdf06_Alignment_2022.pdf
06_Alignment_2022.pdf
 
Computational Prediction Of Protein-1.pptx
Computational Prediction Of Protein-1.pptxComputational Prediction Of Protein-1.pptx
Computational Prediction Of Protein-1.pptx
 
4. sequence alignment.pptx
4. sequence alignment.pptx4. sequence alignment.pptx
4. sequence alignment.pptx
 
Homology modeling
Homology modelingHomology modeling
Homology modeling
 
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to function
 
2010 11-22 bcmb02-print_grayscale
2010 11-22 bcmb02-print_grayscale2010 11-22 bcmb02-print_grayscale
2010 11-22 bcmb02-print_grayscale
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptx
 
Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014
 
Molecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contructionMolecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contruction
 
Characterizing the aggregation and conformation of protein therapeutics
Characterizing the aggregation and conformation of protein therapeuticsCharacterizing the aggregation and conformation of protein therapeutics
Characterizing the aggregation and conformation of protein therapeutics
 
Protein protein interaction
Protein protein interactionProtein protein interaction
Protein protein interaction
 
Protein Chemistry-Proteomics-Lec1_Intro.ppt
Protein Chemistry-Proteomics-Lec1_Intro.pptProtein Chemistry-Proteomics-Lec1_Intro.ppt
Protein Chemistry-Proteomics-Lec1_Intro.ppt
 
Lecture__on__Proteomics_Introduction.ppt
Lecture__on__Proteomics_Introduction.pptLecture__on__Proteomics_Introduction.ppt
Lecture__on__Proteomics_Introduction.ppt
 
Protein 3D structure and classification database
Protein 3D structure and classification database Protein 3D structure and classification database
Protein 3D structure and classification database
 
Protein structure analysis
Protein structure analysis Protein structure analysis
Protein structure analysis
 
Bioinformatics t7-protein structure-v2013_wim_vancriekinge
Bioinformatics t7-protein structure-v2013_wim_vancriekingeBioinformatics t7-protein structure-v2013_wim_vancriekinge
Bioinformatics t7-protein structure-v2013_wim_vancriekinge
 
Types of genomics ppt
Types of genomics pptTypes of genomics ppt
Types of genomics ppt
 
Proteomics a search tool for vaccines
Proteomics a search tool for vaccinesProteomics a search tool for vaccines
Proteomics a search tool for vaccines
 

More from mohamedseyam13

Lecture 4 5
Lecture 4 5Lecture 4 5
Lecture 4 5
mohamedseyam13
 
Lecture 2 3
Lecture 2 3Lecture 2 3
Lecture 2 3
mohamedseyam13
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
mohamedseyam13
 
229983 lecture 26
229983 lecture 26229983 lecture 26
229983 lecture 26
mohamedseyam13
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
mohamedseyam13
 
229983 lecture 26
229983 lecture 26229983 lecture 26
229983 lecture 26
mohamedseyam13
 
Lecture 4 5
Lecture 4 5Lecture 4 5
Lecture 4 5
mohamedseyam13
 
Lecture 2 3
Lecture 2 3Lecture 2 3
Lecture 2 3
mohamedseyam13
 
212121 lecture 2 and 3
212121 lecture 2 and 3212121 lecture 2 and 3
212121 lecture 2 and 3
mohamedseyam13
 
229981 lecture 25
229981 lecture 25229981 lecture 25
229981 lecture 25
mohamedseyam13
 
228216 lec14 15 slide 64
228216 lec14 15 slide 64228216 lec14 15 slide 64
228216 lec14 15 slide 64
mohamedseyam13
 
228132 lecture 21 22
228132 lecture 21 22228132 lecture 21 22
228132 lecture 21 22
mohamedseyam13
 
225377 lecture 19 20
225377 lecture 19 20225377 lecture 19 20
225377 lecture 19 20
mohamedseyam13
 
225375 lecture 18
225375 lecture 18225375 lecture 18
225375 lecture 18
mohamedseyam13
 
222396 lecture 14 15
222396 lecture 14 15222396 lecture 14 15
222396 lecture 14 15
mohamedseyam13
 
220739 lecture 12 13
220739 lecture 12 13220739 lecture 12 13
220739 lecture 12 13
mohamedseyam13
 
219160 lecture 11
219160 lecture 11219160 lecture 11
219160 lecture 11
mohamedseyam13
 
219159 lecture 10
219159 lecture 10219159 lecture 10
219159 lecture 10
mohamedseyam13
 
219158 lecture 9
219158 lecture 9219158 lecture 9
219158 lecture 9
mohamedseyam13
 
219103 lecture 8
219103 lecture 8219103 lecture 8
219103 lecture 8
mohamedseyam13
 

More from mohamedseyam13 (20)

Lecture 4 5
Lecture 4 5Lecture 4 5
Lecture 4 5
 
Lecture 2 3
Lecture 2 3Lecture 2 3
Lecture 2 3
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
229983 lecture 26
229983 lecture 26229983 lecture 26
229983 lecture 26
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
229983 lecture 26
229983 lecture 26229983 lecture 26
229983 lecture 26
 
Lecture 4 5
Lecture 4 5Lecture 4 5
Lecture 4 5
 
Lecture 2 3
Lecture 2 3Lecture 2 3
Lecture 2 3
 
212121 lecture 2 and 3
212121 lecture 2 and 3212121 lecture 2 and 3
212121 lecture 2 and 3
 
229981 lecture 25
229981 lecture 25229981 lecture 25
229981 lecture 25
 
228216 lec14 15 slide 64
228216 lec14 15 slide 64228216 lec14 15 slide 64
228216 lec14 15 slide 64
 
228132 lecture 21 22
228132 lecture 21 22228132 lecture 21 22
228132 lecture 21 22
 
225377 lecture 19 20
225377 lecture 19 20225377 lecture 19 20
225377 lecture 19 20
 
225375 lecture 18
225375 lecture 18225375 lecture 18
225375 lecture 18
 
222396 lecture 14 15
222396 lecture 14 15222396 lecture 14 15
222396 lecture 14 15
 
220739 lecture 12 13
220739 lecture 12 13220739 lecture 12 13
220739 lecture 12 13
 
219160 lecture 11
219160 lecture 11219160 lecture 11
219160 lecture 11
 
219159 lecture 10
219159 lecture 10219159 lecture 10
219159 lecture 10
 
219158 lecture 9
219158 lecture 9219158 lecture 9
219158 lecture 9
 
219103 lecture 8
219103 lecture 8219103 lecture 8
219103 lecture 8
 

Recently uploaded

Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
Areesha Ahmad
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
anitaento25
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 

Recently uploaded (20)

Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 

222397 lecture 16 17

  • 1. HBC1011 Biochemistry I Lecture 16 and 17 – Exploring Evolution and Bioinformatics Ng Chong Han, PhD ITAR1010, 06-2523751 chng@mmu.edu.my
  • 2.
  • 3. Overview • Homology, paralogs, orthologs, convergent & divergent evolution • Statistical analysis of sequence alignments • Evolutionary relationships: protein sequences & tertiary structures • Evolutionary tree 3
  • 4. Evolutionary relationships are present in protein sequences. The human myoglobin sequence (red) differs from the chimpanzee sequence (blue) in only one amino acid in a protein chain of 153 residues
  • 5. Homologs are molecules derived from a common ancestor • Exploration of biochemical evolution attempt to determine how proteins, other molecules, & biochemical pathways have been transformed through time. • Most fundamental relationship between entities = homology • 2 molecules are said to be homologous if they have been derived from a common ancestor. • Search sequence database for sequence-comparison analysis • Gene duplication: any duplication of a region of DNA that contains a gene, which is generated during molecular evolution, can arise as products from DNA replication and repair machinery. 5
  • 6. Homologous molecules = Homologs Paralogs Orthologs Homologs present within one species Homologs present in different species (Differ in their detail biochemical functions, some exception) (very similar or identical functions, some exception)
  • 7. 2 classes of homologs Homologs that perform identical or very similar functions in different organisms are called orthologs, whereas homologs that perform different functions within one organism are called paralogs. Human
  • 8. Orthology 8 • Homologous sequences are orthologous if they are inferred to be descended from the same ancestral sequence separated by a speciation event: when a species diverges into two separate species. • For instance, the plant Flu regulatory protein is present both in Arabidopsis (multicellular higher plant) and Chlamydomonas (single cell green algae). The complex Chlamydomonas version can fully substitute the much simpler Arabidopsis protein, if transferred from algae to plant genome by means of molecular cloning. • Orthologs often, but not always, have the same function.
  • 9. Orthology 9 • Orthologous sequences provide useful information in taxonomic classification and phylogenetic studies of organisms. • Two organisms that are very closely related are likely to display very similar DNA sequences between two orthologs. Conversely, an organism that is further removed evolutionarily from another organism is likely to display a greater divergence in the sequence of the orthologs being studied.
  • 10. Paralogy 10 • Homologous sequences are paralogous if they were created by a duplication event within the genome. • For gene duplication events, if a gene in an organism is duplicated to occupy two different positions in the same genome, then the two copies are paralogous. • Paralogous genes often belong to the same species, but this is not necessary: eg, the hemoglobin gene of humans and the myoglobin gene of chimpanzees are paralogs.
  • 11. Paralogy 11 • Paralogous sequences provide useful and dramatic insight into some of the way genomes evolve. • Function is not always conserved, however. • Human angiogenin diverged from ribonuclease, for example, and while the two paralogs remain similar in tertiary structure, their functions within the cell are now quite different. Human
  • 12. Paralogy regions 12 • Sometimes, large chromosomal regions share gene content similar to other chromosomal regions within the same genome. • Examples of paralogy regions include regions of human chromosome 2, 7, and 12 containing Hox gene clusters, collagen genes and keratin genes.
  • 13. (common ancestor) Two segments of DNA can have shared ancestry because of either a speciation event (orthologs) or a duplication event (paralogs).
  • 14. The importance of the study of the homology 14 • Reveal the evolutionary history of molecules • Information about their function • i.e.: if a newly sequenced protein is homologous to an already characterized protein strong indication of the new protein’s biochemical function.
  • 15. Statistical analysis of sequence alignments can detect homology • How can we know whether 2 human protein are paralogs or whether a yeast protein is the ortholog of a human protein? • Significant sequence similarity between 2 molecules = likely to have the same evolutionary origin & therefore, same 3-D structure, function & mechanism. • Since protein sequences are better conserved evolutionarily than nucleotide sequences, protein sequence comparison produces more reliable and accurate results when dealing with coding DNA. 15
  • 16. Sequence comparison methods • The sequences of two proteins that have an ancestor in common will have diverged in a variety of ways. • Insertions and deletions may have occurred at the ends of the proteins or within the functional domains themselves. • Individual amino acids may have been mutated to other residues of varying degrees of similarity. 16 Human hemoglobin (α chain) 141 a.a. & Human myoglobin (α chain) 153 a.a.
  • 17. Sequence comparison methods • Globins – Myoglobin: binds oxygen in muscle – Hemoglobin: oxygen-carrying protein in blood, composed of 2 identical α chains & 2 identical β chains • Both cradle a heme group: an iron containing organic molecule that binds the oxygen. 17 To detect sequence similarity, we perform sequence alignment.
  • 18. How can we tell where to align the 2 sequences? • Approach: – Compare all possible juxtaposition of one protein sequence with another, in each case recording the number of identical residues that are aligned with one another. – Comparison can be accomplished by simply sliding one sequence past the other, one a.a at a time & counting the number of matched residues. 18
  • 19. (A) A comparison is made by sliding the sequences of the 2 proteins past each other, 1 amino acid at a time, and counting the number of amino acid identities between the proteins (B) The 2 alignments with the largest number of matches are shown above the graph, which plots the matches as a function of alignment. Largest no. of matches
  • 20. Alignment with gap insertion • The sequences can be aligned to capture most of the identities by introducing a gap into one of the sequence. • Gap insert to compensate for the insertion/deletions of nucleotides that may have taken place in the gene. • Gap increases the complexity of sequence alignment: gap of arbitrary size • Method: use scoring system to compare different alignments & include penalties (to prevent unreasonable number of insertion) 20 Gap
  • 21. Alignment with gap insertion: Scoring system 21 • The alignment of α hemoglobin & myoglobin after a gap has been inserted into the hemoglobin α sequence Identity between aligned sequence = +10 points; gap (regardless size) = -25 points. 38 identities & 1 gap; score = ((38x10) + (1x-25)) = 355) 38 matched amino acid in average 147 residues ((153+141)/2) , so the sequences are 25.9% (38/147x100) identical.
  • 22. The statistical significance of alignments can be estimated by shuffling 22 • Because proteins are composed of the same set of 20 amino acids, the alignment of any two unrelated proteins will yield some identities, especially if gaps are allowed. • Even if two proteins have identical amino acid composition, they may not be linked by evolution. It is the order of the residues that implies a relationship. How can we estimate the probability that a specific series of identities is a chance occurrence?
  • 23. The statistical significance of alignments can be estimated by shuffling 23 • The process of the sequences shuffling is repeated many times to yield a histogram – the score from the original alignment should be higher than the scores from random shuffling. The high alignment score does not occur by chance. Original alignment score Random alignment score
  • 24. Distant evolutionary relationships can be detected through the use of substitution matrices • Scoring scheme discussed previously assigned points only to positions occupied by identical a.a • No credit for non-identical a.a • How about substitution? • A scoring system based solely on amino acid identity cannot account for these changes. 24
  • 25. Types of substitution 25 Substitution nonconservativeconservative Replacing one a.a with another that is similar in size and chemical properties. May have minor effects on protein structure and can thus be tolerated without compromising function. An amino acid replaces one that is dissimilar Conservative and single-nucleotide substitutions are likely to be more common than are substitutions with more radical effects.
  • 26. Substitution matrix • Substitution matrix – a scoring system for the replacement of any amino acid with each of the other 19 amino acids. • Large positive score corresponds to substitution that occurs relatively frequently • Large negative score corresponds to substitution that occurs only rarely • When 2 seq are compared, each substitution is assigned a score based on matrix. 26 Blosum-62 : Blocks of amino acid substitution matrix
  • 27. Blosum-62 substitution matrix. Arginine Lysine, conservative Valine  Lysine, nonconservative D E H K R N Q S T A C G P F I L M V W Y red: charged, green: polar, blue: large and hydrophobic, black: other
  • 28. Blosum-62 score • A single-residue gap: -12 points • Additional single gap: -2 points per residue 28 identities Conservative substitution gap
  • 29. Blosum-62 score • The alignment of hemoglobin & myoglobin with conservative substitutions indicated by yellow shading and identities by orange. Score = 115 29 identities Conservative substitution gap
  • 30. Blosum-62 • Blosum-62: Detects homology between less obviously related sequences (not only detect identity) • Alignment of human myoglobin & lupine (plant) leghemoglobin. Identities: orange boxes; conservative substitution: . These sequences are 23% identical. 30
  • 31. Alignment of identities versus Blosum-62 • Alignment of identities: the probability of the alignment occurs by chance alone is high (1:20). • Blosum-62: the probability of the alignment occurs by chance alone is very low (1:300), better, firmer conclusion. 31
  • 32. Sequence analysis – rule of thumb • For sequences longer than 100 amino acids, sequence identities > 25% = statistical significant similarity = sequences are probably homologous. • If 2 sequences are less than 15% identical = pairwise comparison alone is unlikely to indicate statistically significant similarity • If between 15% to 25% further analysis The lack of a statistically significant degree of sequence similarity does not rule out homology Why?? 32
  • 33. Homology VS Similarity 33 • Similarity refers to the likeness or % identity between 2 sequences • Similarity means sharing a statistically significant number of amino acids • Similarity does not imply homology • Homology refers to shared ancestry • Two sequences are homologous is they are derived from a common ancestral sequence • Homology usually implies similarity Homology among proteins is often incorrectly concluded on the basis of sequence similarity. High sequence similarity might occur because of convergent evolution, or, as with shorter sequences, because of chance. Such sequences are similar but not homologous.
  • 34. Databases can be searched to identify homologous sequences • Database search for homologous seq: using online resources on NCBI (National Center for Biotechnology Information) • Procedure: BLAST (Basic Local Alignment Search Tool) search. • Result: a list of sequence alignments. • Open reading frame (ORF): protein-coding region • Hypothetical protein: ORF with no assigned function 34
  • 35. E value (highlighted in red): the number of sequences with this level of similarity expected to be in the DB by chance is 2x10-25
  • 36. Examination of 3-D structure enhances our understanding of evolutionary relationship • To gain a deeper understanding of evolutionary relationships between proteins, we must examine 3-D structures because – The sequences of many proteins that have been descended from a common ancestor have diverged to such an extent that the relationship between the proteins can no longer be detected from their sequences alone. – Biomolecules generally function as intricate 3-D structures rather than as linear polymers. – Sequence mutation affected function & function directly related to tertiary structure 36
  • 37. Tertiary structure is more conserved than primary structure • Because 3-D structure is much more closely associated with function than its sequence, tertiary structure is more evolutionarily conserved than its primary structure. • i.e.: tertiary structures of globin, extremely similar even though the similarity between human myoglobin & lupine leghemoglobin is just barely detectable at seq level & that between human hemoglobin and lupine leghemoglobin is not statistical significant. 37
  • 38. Conservation of 3-D structure. The tertiary structures of human hemoglobin, human myoglobin, & lupine leghemoglobin are conserved. This structural similarity firmly establishes that the framework that binds the heme group & facilitates the reversible binding of oxygen has been conserved over a long evolutionary period.
  • 39. Tertiary structure is more conserved than primary structure • Comparison of 3-D structures has revealed striking similarities between proteins that were not expected to be related. • i.e.: protein actin (major component of the cytoskeleton) & heat shock protein 70 (assists protein folding inside cell) – Similar in structure, only 15.6% sequence identity – Paralogs – Different biological roles, descended from a common ancestor 39
  • 40. Structures of Actin & Hsp-70. A comparison of the identically colored elements of secondary structure reveals the overall similarity in structure despite the difference in biochemical activities.
  • 41. Conserved function sequence 41 • Regions & residues critical for protein function are more strongly conserved than are other residues. • i.e.: each type of globin contains a bound heme group with an iron atom at its center. A histidine residue that interacts directly with this iron is conserved in all globins. Identified key residues/highly conserved sequences within a family of proteins identify other family members even when the overall level of sequence similarity is below statistical significance.
  • 42. Divergent and Convergent evolution • Divergent evolution: process by which 2 or more biological characteristics have a common origin, but have diverged over evolutionary time. How might two unrelated proteins come to resemble each other structurally? Two proteins evolving independently may have converged on a similar structure in order to perform a similar biochemical activity. • Convergent evolution: process by which very different evolutionary pathways lead to the same solution (different origin points). 42
  • 43. One example of convergent evolution is the serine protease family, which cleaves peptide bonds by hydrolysis. The structure of the active sites at which the hydrolysis reaction takes place are remarkably similar.
  • 44. The similarity might suggest that these proteins are homologous. However, striking differences in the overall structures of these proteins make an evolutionary relationship extremely unlikely.
  • 45. Evolutionary tree can be constructed on the basis of sequence information • Aligned sequences can be used to construct an evolutionary tree in which the length of the branch connecting each pair of proteins is proportional to the number of amino acid differences between the sequences. Branch lengths indicate genetic change i.e. the longer the branch, the more genetic change has occurred. • To estimate the approximates dates of gene duplications & other evolutionary events, evolutionary tree can be calibrated comparing the deduced branch points with divergence times determined from the fossil record. 45
  • 46. An evolutionary tree for globins. The branching structure was deduced by sequence comparison, whereas the results of fossil studies provided the overall time scale showing when divergence occurred.
  • 47. Evolutionary tree can be constructed on the basis of sequence information How can we estimate the approximate dates of gene duplications and other evolutionary events? • Duplication leading to the 2 chains of hemoglobin appears to have occurred 350 million years ago. – This estimation is supported by the observation that jawless fish such as the lamprey, which diverged from bony fish ~400 million years ago, contain hemoglobin built from a single type of polypeptide chain. 47 The lamprey
  • 48. Modern techniques make the experimental exploration of evolution possible • Ancient DNA can sometimes be amplified and sequenced using polymerase chain reaction (PCR) and DNA sequencing. • This approach has been applied to mitochondrial DNA from a Neanderthal fossil estimated at between 30,000 and 100,000 years of age found near Düsseldorf, Germany, in 1856. Comparison with the sequences from Homo sapiens revealed between 22 and 36 substitutions, considerably fewer than the average of 55 differences between human beings and chimpanzees over the common bases in this region. 48
  • 49. Modern techniques make the experimental exploration of evolution possible • Further analysis suggested that the common ancestor of modern human beings and Neanderthals lived approximately 600 million years ago. • An evolutionary tree constructed by using these and other data revealed that the Neanderthal was not an intermediate between chimpanzees and human beings but, instead, was an evolutionary "dead end" that became extinct 49 Successful sequencing of ancient DNA requires sufficient DNA for reliable amplification and the rigorous exclusion of all sources of contamination.
  • 50. Archeological sites in Indonesia • Homo floresiensis ("Flores Man"; nicknamed "hobbit") is an extinct species thought to be in the genus Homo. The remains of an individual (1.1 m in height) were discovered in 2003 at Liang Bua on the island of Flores in Indonesia. • This hominin had originally been considered to be remarkable for its survival until only 12,000 years ago. However, by 2016, more work has pushed their existence back to 50,000 years ago. 50
  • 51. Glossary • BLOSUM – Blocks Substitution Matrix. A substitution matrix in which scores for each position are derived from observations of the frequencies of substitutions in blocks of local alignments in related proteins. Each matrix is tailored to a particular evolutionary distance. In the BLOSUM62 matrix, for example, the alignment from which scores were derived was created using sequences sharing no more than 62% identity. • Alignment – The process of lining up two or more sequences to achieve maximal levels of identity (and conservation, in the case of amino acid sequences) for the purpose of assessing the degree of similarity and the possibility of homology.
  • 52. • Juxtaposition – the act of placing two or more things side by side or the state of being so placed. • E value – Expectation value. The number of different alignments with scores equivalent to or better than raw score that are expected to occur in a database search by chance. The lower the E value, the more significant the score. • Substitution – The presence of a non-identical amino acid at a given position in an alignment. If the aligned residues have similar physico- chemical properties the substitution is said to be "conservative". • Conservation – Changes at a specific position of an amino acid or (less commonly, DNA) sequence that preserve the physico-chemical properties of the original residue.
  • 53. • Identity – The extent to which two (nucleotide or amino acid) sequences are invariant. • gap – A space introduced into an alignment or position at which a letter is paired with a null. • Similarity – The extent to which nucleotide or protein sequences are related. The extent of similarity between two sequences can be based on percent sequence identity and/or conservation. In BLAST similarity refers to a positive matrix score. • Query – The input sequence (or other type of search term) with which all of the entries in a database are to be compared.
  • 54. Summary 1. Homologs are descended from a common ancestor. 2. Statistical analysis of sequence alignments can detect homology. 3. Examination of three-dimensional structure enhances our understanding of evolutionary relationships. 4. Evolutionary trees can be constructed on the basis of sequence information. 54
  • 55. Study questions 1. What are the differences between paralog and ortholog? 2. How can we study the function of a novel gene using sequence alignment? 3. Why is it possible two similar sequences not homologous? 4. Why is protein sequence comparison produce more accurate result than nucleotide sequence comparison? 5. Why is tertiary structure more evolutionarily conserved than its primary structure? 6. What is a conservative substitution? 7. What is a sequence alignment? 8. What online tool can be used to search for homologous sequences? 55
  • 56. How confident can we be that orthologs are similar, but paralogs differ? 56 • The idea that orthologs share similar functions, whereas paralogs have different functions, has thus become accepted by many and is the standard textbook model, as exemplified by the ‘Phylogenetics Factsheet’ of the National Centre for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.- gov/About/primer/phylo.html). • However, more new evidences show that orthologs and paralogs are not so different in either their evolutionary rates or their mechanisms of divergence. • Thus, functional change between orthologs might be as common as between paralogs, and future studies should be designed to test the impact of duplication against this alternative model. Studer and Robinson-Rechavi (2009)