Lecture delivered by T. Ashok Kumar, Head, Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil, Thuckalay, INDIA. UGC Sponsored National Workshop on BIOINFORMATICS AND GENOME ANALYSIS for College Teachers on August 11 & 12, 2014. Organized by Centre for Bioinformatics, Department of Zoology, NMCC.
Sequence homology search and multiple sequence alignment(1)AnkitTiwari354
Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal (or lateral) gene transfer event (xenologs).[1]
Homology among DNA, RNA, or proteins is typically inferred from their nucleotide or amino acid sequence similarity. Significant similarity is strong evidence that two sequences are related by evolutionary changes from a common ancestral sequence. Alignments of multiple sequences are used to indicate which regions of each sequence are homologous.
Sequence alig Sequence Alignment Pairwise alignment:-naveed ul mushtaq
Sequence Alignment Pairwise alignment:- Global Alignment and Local AlignmentTwo types of alignment Progressive Programs for multiple sequence alignment BLOSUM Point accepted mutation (PAM)PAM VS BLOSUM
Secondary Structure Prediction of proteins Vijay Hemmadi
Secondary structure prediction has been around for almost a quarter of a century. The early methods suffered from a lack of data. Predictions were performed on single sequences rather than families of homologous sequences, and there were relatively few known 3D structures from which to derive parameters. Probably the most famous early methods are those of Chou & Fasman, Garnier, Osguthorbe & Robson (GOR) and Lim. Although the authors originally claimed quite high accuracies (70-80 %), under careful examination, the methods were shown to be only between 56 and 60% accurate (see Kabsch & Sander, 1984 given below). An early problem in secondary structure prediction had been the inclusion of structures used to derive parameters in the set of structures used to assess the accuracy of the method.
Some good references on the subject:
Lecture delivered by T. Ashok Kumar, Head, Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil, Thuckalay, INDIA. UGC Sponsored National Workshop on BIOINFORMATICS AND GENOME ANALYSIS for College Teachers on August 11 & 12, 2014. Organized by Centre for Bioinformatics, Department of Zoology, NMCC.
Sequence homology search and multiple sequence alignment(1)AnkitTiwari354
Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal (or lateral) gene transfer event (xenologs).[1]
Homology among DNA, RNA, or proteins is typically inferred from their nucleotide or amino acid sequence similarity. Significant similarity is strong evidence that two sequences are related by evolutionary changes from a common ancestral sequence. Alignments of multiple sequences are used to indicate which regions of each sequence are homologous.
Sequence alig Sequence Alignment Pairwise alignment:-naveed ul mushtaq
Sequence Alignment Pairwise alignment:- Global Alignment and Local AlignmentTwo types of alignment Progressive Programs for multiple sequence alignment BLOSUM Point accepted mutation (PAM)PAM VS BLOSUM
Secondary Structure Prediction of proteins Vijay Hemmadi
Secondary structure prediction has been around for almost a quarter of a century. The early methods suffered from a lack of data. Predictions were performed on single sequences rather than families of homologous sequences, and there were relatively few known 3D structures from which to derive parameters. Probably the most famous early methods are those of Chou & Fasman, Garnier, Osguthorbe & Robson (GOR) and Lim. Although the authors originally claimed quite high accuracies (70-80 %), under careful examination, the methods were shown to be only between 56 and 60% accurate (see Kabsch & Sander, 1984 given below). An early problem in secondary structure prediction had been the inclusion of structures used to derive parameters in the set of structures used to assess the accuracy of the method.
Some good references on the subject:
Global and local alignment (bioinformatics)Pritom Chaki
A general global alignment technique is the Needleman–Wunsch algorithm, which is based on dynamic programming. Local alignments are more useful for dissimilar sequences that are suspected to contain regions of similarity or similar sequence motifs within their larger sequence context.
A brief introduction to two techniques used to study protein interactions: Yeast two hybrid (Y2H) system and Chromatin immunoprecipitation(ChIP)
I hope it helps and please comment if I've made any mistakes.
Scoring system is a set of values for qualifying the set of one residue being substituted by another in an alignment.
It is also known as substitution matrix.
Scoring matrix of nucleotide is relatively simple.
A positive value or a high score is given for a match & negative value or a low score is given for a mismatch.
Scoring matrices for amino acids are more complicated because scoring has to reflect the physicochemical properties of amino acid residues.
After sequencing of the genome has been done, the first thing that comes to mind is "Where are the genes?". Genome annotation is the process of attaching information to the biological sequences. It is an active area of research and it would help scientists a lot to undergo with their wet lab projects once they know the coding parts of a genome.
The experimental methods used by biotechnologists to determine the structures of proteins demand sophisticated equipment and time.
A host of computational methods are developed to predict the location of secondary structure elements in proteins for complementing or creating insights into experimental results.
Chou-Fasman algorithm is an empirical algorithm developed for the prediction of protein secondary structure
Global and local alignment (bioinformatics)Pritom Chaki
A general global alignment technique is the Needleman–Wunsch algorithm, which is based on dynamic programming. Local alignments are more useful for dissimilar sequences that are suspected to contain regions of similarity or similar sequence motifs within their larger sequence context.
A brief introduction to two techniques used to study protein interactions: Yeast two hybrid (Y2H) system and Chromatin immunoprecipitation(ChIP)
I hope it helps and please comment if I've made any mistakes.
Scoring system is a set of values for qualifying the set of one residue being substituted by another in an alignment.
It is also known as substitution matrix.
Scoring matrix of nucleotide is relatively simple.
A positive value or a high score is given for a match & negative value or a low score is given for a mismatch.
Scoring matrices for amino acids are more complicated because scoring has to reflect the physicochemical properties of amino acid residues.
After sequencing of the genome has been done, the first thing that comes to mind is "Where are the genes?". Genome annotation is the process of attaching information to the biological sequences. It is an active area of research and it would help scientists a lot to undergo with their wet lab projects once they know the coding parts of a genome.
The experimental methods used by biotechnologists to determine the structures of proteins demand sophisticated equipment and time.
A host of computational methods are developed to predict the location of secondary structure elements in proteins for complementing or creating insights into experimental results.
Chou-Fasman algorithm is an empirical algorithm developed for the prediction of protein secondary structure
It includes the information related to a bioinformatics tool BLAST (Basic Local Alignment Search Tool), BLAST is in-silico hybridisation to find regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance. This presentation too contains the input - output format, Blast process and its types .
Sequence alignment & comparative genomics
▶ What’s the difference between homology and analogy?
▶ How homology is estimated?
▶ Where do sequence similarity scores come from?
▶ What are the BLOSUM protein scoring matrices?
▶ How are insertions and deletions scored?
Lab talk 190210 efficacy studies on radioligand hits_beginnings of fret assay...Laurence Dawkins-Hall
1. Titration of REL 1 antagonist hits identified by radio mimetic assay to quantify efficacy (IC50)
2. Exposition of HT FRET assay principles for large scale compound library screens to empirically identify REL 1 antagonist hits
2. A protein sequence from species A
◦ What is the nearest species this protein is similar
to?
◦ Where is it originated from?
◦ Putative function.
◦ If it has a conserved motif etc.
3. Blast (Basic Local Alignment Search Tool)
◦ NCBI Blast
◦ Wu-Blast
◦ PSI-Blast
Fasta
SSearch
4. Heuristic (Educated guess)
Does not compare sequence to its entirety.
Quickly locates short matches(seeds)
Word size
Seeds are extended in both directions
Threshold is defined
◦ > Threshold -> keep the alignment
◦ < Threshold -> discard the alignment
6. A Query sequence:
◦ Nucleotide
◦ Protein
A Target Database
◦ Nucleotide
◦ Protein
Blast Program
◦ Blastn
◦ Blastp
◦ tBlastx (Slowest Nt query translated against Nt database
trlt.)
◦ tBlastn (Protein query translated nt. Database)
◦ Blastx (Nucleotide trnslt against Protein database)
7. E Value -> Probability value at which the
sequence hits may occur by chance
Score -> Similarity score.
◦ By chance rain probability is 0.001
◦ Passing by chance etc.
◦ Less the e –value the better is the sensitivity of the
alignment.
8. Remove Low Complexity regions
Generate all the k mers.
List All Possible matching key words.
- Blast cares about only high scoring pairs
- Fasta stores all pairs irrespective of the
scores.
Extend the matches into high scoring
pairs(HSPs)
Evaluate results depending on thresholds set.
Extend HSPs and join them together.
14. Substitution Matrices
Insertion and deletions are less likely than
a substitution
Insertion and Deletion in DNA sequence leads to Frame
shift.
PAM Matrices(Point Accepted Mutation Matrices)
Margaret Dayhoff 1978
PAM1 -> Expected rates of substition if 1% of the
amino acids have changed
BLOSUM : Blocks Substitution Matrix (% of identity)
15. PAM matrices are based on a
simple evolutionary model
MATLFC MLTLCC
M(A/L)TL(F/C)C Two changes
Ancestral sequence?
• Only mutations are allowed
• Sites evolve independently
15
16. Guidelines for using matricies
Protein Query LengthMatrix Open Gap Extend Gap
>300 BLOSUM50 -10 -2
85-300 BLOSUM62 -7 -1
50-85 BLOSUM80 -16 -4
>300 PAM250 -10 -2
85-300 PAM120 -16 -4
35-85 MDM40 -12 -2
<=35 MDM20 -22 -4
<=10 MDM10 -23 -4
PAM100 ==> Blosum90
PAM120 ==> Blosum80
PAM160 ==> Blosum60
PAM200 ==> Blosum52
PAM250 ==> Blosum45
17. Scoring Matrices
S = [sij] gives score of aligning character i
with character j for every pair i, j.
STPP
CTCA
0 + 3 + (-3) + 1
=1
17
Editor's Notes
Series of methods that relies on pairwise alignments