Lecture 5.pptx

Lecture 5:- Bioinformatic tools for DNA technologies
Dr. Naulikha Kituyi
Department of Biological Sciences
University of Embu
Biochemistry-
2020

Bioinformatics tools
SELF-TEST QUESTIONS
•Describe the variants of BLAST and their
applications
•Discuss the relevance of sequence alignment
•Discuss the differences between Homology and
similarity
Bioinformatic tools are computer programs that
analyze one or more sequences. There are a
dizzying array of bioinformatic tools that can
analyze sequences to find protein domains
(Pfam), or that can search through databases of
millions of sequences to find ones that are similar
(BLAST) or that can find potential protein-coding
regions (ORF-Finder).
BLAST
• BLAST (Basic Local Alignment Search Tool) is one of the
most widely used tools to gain sequence information.
Finding similarity between DNA and protein sequences
against a database is one of the first things people do when
trying to get immediate information about a sequence of
interest. Doing these searches allows scientists to gain
knowledge about that particular gene’s function. BLAST
finds regions of similarity between the input sequence and
sequences found in its databases.
• Sequence alignment is a way of arranging two or more
sequences (DNA, RNA, or aa.(proteins)) to identify regions
of similar character patterns
• Sequence similarity could be a result of functional,
structural, or evolutionary relationships between the
sequences
• Procedure involves searching for series of identical or
similar characters/patterns in the same order between the
sequences
• Non identical characters aligned as mismatches or
opposite a gap in the other sequence
• Alignment made between a known sequence and unknown
sequence or between two unknown sequences

Why Sequence
Alignment (uses)?:
• Useful in DNA and Protein sequences for:
– Discovering functional information
– Predicting molecular structure
– Discovering evolutionary relationships
• Sequences that are very much alike probably
have:
– Same function
– Similar secondary and 3-D structure (if proteins)
– Shared ancestral sequence (though not always)
• Sequence alignment enables the following:
– Annotation of new sequences
– Modeling of protein structures
– Phylogenetic analysis

The BLAST Algorithm
BLAST - Basic Local Alignment Search Tool
Blast programs use a heuristic search algorithm.. Blast programs were
designed for fast database searching, with minimal sacrifice of
sensitivity for distantly related sequences. The programs search
databases in a special compressed format. Variants of BLAST
BLASTN - Compares a DNA query to a DNA database.Searches
both strands automatically. It is optimized for speed, rather than
sensitivity.
BLASTP - Compares a protein query to a protein database.
BLASTX - Compares a DNA query to a protein database , by
translating the query sequence in the 6 possible frames , and
comparing each against the database (3 reading frames from each
strand of the DNA) searching.
TBLASTN - Compares a protein query to a DNA database, in the 6
possible frames of the database.
TBLASTX - Compares the protein encoded in a DNA query to the
protein encoded in a DNA database, in the 6 6 possible frames of
both query and database sequences.
BLAST2 - Also called advanced BLAST. It can perform gapped
General View of How the BLAST
Program Works
BLAST - The program compares the
query to each sequence in database
using heuristic rules to speed up the
pairwise comparison. It creates
sequence abstraction by listing exact
and similar words. This is done in
advance for each sequence in the
database on the run for a certain query.
BLAST finds similar words between the
query and each database sequence, It
then extends such words to obtain high-
scoring sequence pairs (HSPs). It also
calculates statistics analytically like
FastA does.

Using Blast For Sequence Alignment
Please attempt the following:
Go on to www.ncbi.nlm.nih.gov. and blast
the Dengue virus sequence that we
examined in Lecture 2,( NC_001477)
using blastn. Copy and paste the
sequence accession number into the
search engine, click on search to confirm
the identity of the sequence then click on
the blastn option to see all the search
results with similar sequences. To see
the alignment for the sequences and the
percent identities, click on any of the
sequences like the one highlighted above
and go to alignment.

Terms used in Sequence Comparisons
Terms used in Sequence Comparisons
Homologous- Two related sequences are termed as
homologous to each other. These can be either
orthologs or paralogs. The homologous protein from
two different organisms with similar functions are
termed as orthologs where as homologous protein
with different function in an organism is called as
paralog.
Identity and similarity- The ratio of identical amino acids
residues to the total number of amino acids present in the
entire length of the sequence is termed as identity .
Whereas ratio of similar amino acids in a sequence
relative to the total number of amino acid present is termed
as similarity. The extend of similarity between two amino
acids is calculated with a similarity matrix
• Identity & Similarity:
• Sequence Identity: Exactly the same Amino acid or
nucleotide in the same position
• Sequence Similarity: Content includes identity and
substitutions (aa residues) with similar chemical
properties
• Similarity: A quantifiable property- Two sequences
are similar if order of sequence characters is
recognizably the same and they can be aligned

Global and Local Alignmennt
The Alignment of two query sequences can be
global or local (Figure.4). In global alignment,
the complete length of the protein sequences are
compared to another where as in the case of local
alignment, only a part of the sequence is
compared. The global alignment is used to
classify the protein into different classes where
as local alignment is used to identify the motif or
domain.

Sequence Alignment Example- Homology
10 20 30 40 50 60
HUMAN MNPLLILTFVAAALAAPFDDDDKIVGGYNCEENSVPYQVSLNSGYHFCGGSLINEQWVVS
:. ::::..:.::.: :..:::::::::.: :.:::::::::::::::::::::.:::::
MSALLILALVGAAVAFPLEDDDKIVGGYTCPEHSVPYQVSLNSGYHFCGGSLINDQWVVS
RAT
10 20 30 40 50 60
70 80 90 100 110 120
HUMAN AGHCYKSRIQVRLGEHNIEVLEGNEQFINAAKIIRHPQYDRKTLNNDIMLIKLSSRAVIN
:.::::::::::::::::.::::.::::::::::.::.:. ::::::::::::: . .:
AAHCYKSRIQVRLGEHNINVLEGDEQFINAAKIIKHPNYSSWTLNNDIMLIKLSSPVKLN
RAT
70 80 90 100 110 120
130 140 150 160 170 180
HUMAN ARVSTISLPTAPPATGTKCLISGWGNTASSGADYPDELQCLDAPVLSQAKCEASYPGKIT
:::. ..::.: .::.::::::::: :.:.. :: :::.:::::::: :::.:::.::
ARVAPVALPSACAPAGTQCLISGWGNTLSNGVNNPDLLQCVDAPVLSQADCEAAYPGEIT
RAT
130 140 150 160 170 180
190 200 210 220 230 240
HUMAN SNMFCVGFLEGGKDSCQGDSGGPVVCNGQLQGVVSWGDGCAQKNKPGVYTKVYNYVKWIK
:.:.::::::::::::::::::::::::::::.:::: ::: ..::::::: :.: ::.
RAT SSMICVGFLEGGKDSCQGDSGGPVVCNGQLQGIVSWGYGCALPDNPGVYTKVCNFVGWIQ
190 200 210 220 230 240
HUMAN NTIAAN
.:::::
DTIAAN
RAT
Human (247 aa) vs Rat (246 aa) Trypsin : show 76.4% identity (91.9% similarity) in 246 aa overlap (1-246:1-246) , E(1) < 2e-86
The similarity is statistically significant ( > expected by chance) , so sequences can be considered homologous

Detection of CPG Islands
Detection of methylated-CpG islands in easily accessible biological materials such as
serum has the potential to be useful for the early diagnosis of cancer. Most currently used
methods for detecting methylated-CpG islands are based on sodium bisulfite conversion
of genomic DNA, followed by PCR reactions
CpG Islands
The CpG sites or CG sites are regions of DNA where a cytosine nucleotide is followed by
a guanine nucleotide in the linear sequence of bases along its 5' → 3' direction. CpG sites
occur with high frequency in genomic regions called CpG islands (or CG islands).
Cytosines in CpG dinucleotides can be methylated to form 5-methylcytosines. Enzymes
that add a methyl group are called DNA methyltransferases. In mammals, 70% to 80% of
CpG cytosines are methylated.[1] Methylating the cytosine within a gene can change its
expression, a mechanism that is part of a larger field of science studying gene regulation
that is called epigenetics.
In humans, about 70% of promoters located near the transcription start site of a gene
(proximal promoters) contain a CpG island

Lecture 5.pptx

Recommended

Recommended

More Related Content

Similar to Lecture 5.pptx

Similar to Lecture 5.pptx (20)

Recently uploaded

Recently uploaded (20)

Lecture 5.pptx