This document discusses bioinformatics tools for mutation detection and sequence analysis. It describes how sequencing can identify genes and mutations, and how sequence alignment helps find similarities between sequences. It then discusses a specific mutation in the filaggrin gene that is linked to skin and respiratory conditions. Bioinformatics tools like ORFPredictor and sequence retrieval programs are used to analyze the filaggrin gene sequence and identify similar proteins in other species that may aid in future therapeutics.
1. M.Sc. II Biotechnology
Animal Biotechnology
Dariyus Z Kabraji
Bioinformatics in Mutation
Detection
M.Sc. II Biotechnology
Animal Biotechnology
Dariyus Z Kabraji
2. Introduction
• Mutation is a permanent change of the nucleotide
sequence of the genome of an organism, virus, or extra-
chromosomal DNA or other genetic elements.
• Result from damage to DNA which is not repaired or to
RNA genomes (typically caused by radiation or
chemical mutagens), errors in the process of
replication, or from the insertion or deletion of
segments of DNA by mobile genetic elements.
• Can either have no effect, alter the product of a gene, or
prevent the gene from functioning properly or
completely.
3. Sequence Analysis
• Sequencing can be used to find genes, segments
of DNA that code for a specific protein or
phenotype
• If a region of DNA has been sequenced, it can be
screened for characteristic features of genes.
• Way of arranging the sequences of DNA, RNA or
protein to identify regions of similarity .
• Helps in inferring functional , Structural or
evolutionary relationship between the sequence
• Used to find the best- matching sequences
4. Alignment - the Key
• Alignment is the task of locating “equivalent”
regions of two or more sequences to maximize
their similarity
WILLIAM SHAKESPEARE
WILLIAM SHAGSPUR—
Red indicates mismatches and white indicates
gaps
5. Principles of Sequence Alignment
• Alignment can reveal homology between
sequences
• Similarity is descriptive term that tells about the
degree of match between the two sequences
• Sequence similarity does not always imply a
common function
• Conserved function does not always imply
similarity at the sequence level
• Convergent evolution: sequences are highly
similar, but are not homologous
6. Filaggrin Gene Mutation
• Filaggrins are filament-associated proteins which
bind to keratin fibers in epithelial cells
• Normally found in large quantities in the
outermost layers of the skin
• Is essential for skin barrier function, helping to
form a protective layer at the surface of the skin
• Individuals with truncation mutation in the gene
coding for filaggrin are strongly predisposed to a
severe form of dry skin (ichthyosis vulgaris),
asthma, and eczema
7. Filaggrin Gene Mutation
• Filaggrins are filament-associated proteins which
bind to keratin fibers in epithelial cells
• Normally found in large quantities in the
outermost layers of the skin
• Is essential for skin barrier function, helping to
form a protective layer at the surface of the skin
• Individuals with truncation mutation in the gene
coding for filaggrin are strongly predisposed to a
severe form of dry skin (ichthyosis vulgaris),
asthma, and eczema
8. Objective of Sequence Analysis in this
case
• To detect the faulty filaggrin gene which may
cause Eczema & Asthma
• To find out the identical human proteins which
may have functional similarity with filaggrin
• To find out same or identical proteins from
other species which may be helpful in
therapeutics
9. Reading Frames (RF)
• A reading frame is a way of dividing the
sequence of nucleotides in a nucleic acid
(DNA or RNA) molecule into a set of
consecutive, non-overlapping triplets.
• Where these triplets equate to amino acids or
stop signals during translation, they are called
codons.
11. Open Reading Frame (ORF)
• An open reading frame (ORF) is the part of a reading frame that has
the potential to code for a protein or peptide.
• An ORF is a continuous stretch of codons beginning with a start
codon (usually ATG) and ending with a stop codon (usually TAA,
TAG or TGA). One common use of open reading frames is as one
piece of evidence to assist in gene prediction.
• Long ORFs are often used, along with other evidence, to initially
identify candidate protein coding regions in a DNA sequence.
Three tools for determining ORF:
1. Finder
2. Investigator
3. Predictor
12. ORFPredictor
• OrfPredictor is a web server designed for identifying
protein-coding regions in expressed sequence tag (EST)-
derived sequences.
• For query sequences with a hit in BLASTX, the program
predicts the coding regions based on the translation reading
frames identified in BLASTX alignments; otherwise, it
predicts the most probable coding region based on the
intrinsic signals of the query sequences.
• The output is the predicted peptide sequences in the FASTA
format, and a definition line that includes the query ID, the
translation reading frame and the nucleotide positions where
the coding region begins and ends.
• OrfPredictor facilitates the annotation of EST-derived
sequences, particularly, for large-scale EST projects.
13. Sequence Retrieval
• The simplest input for RSAT is a list of gene names.
• Using this list the retrieve-seq program returns
upstream, downstream or unspliced ORF sequences
(introns and spliced ORFs will soon be supported).
• The user can specify the left and right limits of the
sequences to be retrieved.
• Default values have been selected for each genome,
depending on the average size of the intergenic regions
and mechanisms of regulation.
• Upstream sequences can be retrieved over a constant
size, but an option also allows to clip them in order to
avoid the inclusion of coding sequences from upstream
ORFs.
14. Anticipated Prospects
• Preventive measures for susceptible people
• Therapeutic -Identify similar proteins which
might have similar functions as that of filaggrin.
• Plant & Animal proteins – Proteins identical with
filaggrin in plants and animals can be utilized for
protein supplements
• Drugs or other treatments aimed at the filaggrin
gene are still some years away but research in this
direction give hope to those with these distressing
conditions.
15. References
• Filaggrin mutations and the skin; Dipankar De,
Sanjeev Handa; Contact Dermatitis; Year :
2012, Volume : 78. Issue : 5, Page : 545-551
• Regulatory Sequence Analysis Tools; Jacques
van Helden; Nucleic Acids Research, 2003,
Vol. 31, No. 13 3593–3596 DOI:
10.1093/nar/gkg567
Editor's Notes
Spontaneous mutations on the molecular level can be caused by:[21]
Tautomerism — A base is changed by the repositioning of a hydrogen atom, altering the hydrogen bonding pattern of that base, resulting in incorrect base pairing during replication.
Depurination — Loss of a purine base (A or G) to form an apurinic site (AP site).
Deamination — Hydrolysis changes a normal base to an atypical base containing a keto group in place of the original amine group. Examples include C → U and A → HX (hypoxanthine), which can be corrected by DNA repair mechanisms; and 5MeC (5-methylcytosine) → T, which is less likely to be detected as a mutation because thymine is a normal DNA base.
Slipped strand mispairing — Denaturation of the new strand from the template during replication, followed by renaturation in a different spot ("slipping"). This can lead to insertions or deletions.
Types of Alignment
Based on Completeness
Global
Local
Based on Numbers
Pair wise alignment
Multiple sequence Alignment
At least in a subset of those with asthma, the filaggrin gene defect may be the fundamental predisposing factor not only for the development of eczema but also asthma
At least in a subset of those with asthma, the filaggrin gene defect may be the fundamental predisposing factor not only for the development of eczema but also asthma
ORF Finder is run by NCBI and Investigator is software, not a web server
Unfortunately, though it is the most user friendly, ORF Predictor is shutting down after September 30th this year
All these tools facilitate sequence retrieval, which is essentially our goal
Regulatory Sequence Analysis Tools (RSAT)
Entrez is the NCBI run seq retrieval tool